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ABSTRACT 

We investigate how well the redshift distribution of a population of extragalactic ob- 
jects can be reconstructed using angular cross-correlations with a sample whose red- 
shifts are known. We derive the minimum variance quadratic estimator, which has 
simple analytic representations in very applicable limits and is significantly more sen- 
sitive than earlier proposed estimation procedures. This estimator is straightforward 
to apply to observations, it robustly finds the likelihood maximum, and it conveniently 
selects angular scales at which fluctuations are well approximated as independent be- 
tween redshift bins and at which linear theory applies. We find that the linear bias 
times number of objects in a redshift bin generally can be constrained with cross- 
correlations to fractional error y^lO 2 N\,i n /Af, where N is the total number of spectra 
per dz and iVbin is the number of redshift bins spanned by the bulk of the unknown 
population. The error is often independent of the sky area and sampling fraction. Fur- 
thermore, we find that sub-percent measurements of the angular source density per 
unit redshift, dN / ' dz, are in principle possible, although cosmic magnification needs to 
be accounted for at fractional errors of < 10 per cent. We discuss how the sensitivity 
to dN / dz changes as a function of photometric and spectroscopic depth and how to 
optimize the survey strategy to constrain dN/dz. We also quantify how well cross- 
correlations of photometric redshift bins can be used to self-calibrate a photometric 
redshift sample. Simple formulae that can be quickly applied to gauge the utility of 
cross correlating different samples are given. 
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1 INTRODUCTION 

In many spectral bands, the redshift distribution of a 
source population is difficult to determine (e.g., the ra- 
dio, microwave, infrared, and X-ray). Even in the optical, 
where photometric techniques are widely applied to estimate 
source redshifts, these techniques work better for certain 
galaxy types than for others. However, extragalactic objects 
that are close together on the sky are also likely to be close 
in redshift. Thus, angular cross-correlations between popu- 
lations with poorly known redshifts and those with better 
known redshifts can be used to improve the determination 
of the former's redshift distribution. Such reconstruction has 
a wide range of applications, from ascertaining the redshift 
distribution of diffuse backgrounds to calibrating photomet- 
ric redshifts for the next generation of large-scale structure 
surveys. 

Several previous studies have attempted to measure a 
population's redshift distribution, dN/dz, by using its con- 



stituents' proximity on the sky to sources with known red- 
shifts, i.e., by computing angular cross correlation statis- 
tics between the two populations (Phillipps & Shanks 1987; 
Erben et al. 2009). Similar techniques have been used to 
search for contamination in photometrically selected red- 
shift slices or to bound the median redshift of a sample 
(Padmanabhan et al. 2007; Erben et al. 2009; Benjamin 
et al. 2010, 2012). Different dN/dz cross-correlation esti- 
mators have also been studied theoretically (Phillipps 1985; 
Newman 2008; Matthews & Newman 2010; Schulz 2010; 
Matthews & Newman 2012). However, it is unknown how 
close any of these estimators are to being optimal. It is also 
unclear which survey specifications (depth, area, sampling 
fraction, etc.) are best for reconstructing the redshift distri- 
bution of an unknown population. 

This paper attempts to answer these questions. We 
write down the optimal dN/dz estimator and show that 
in very applicable limits, intuitive formulae describe how 
well the redshifts of a given source population can be con- 



© 2013 RAS 



2 M. McQuinn and M. White 



strained from a population whose redshift distribution is 
better known. In the limit of a dense spectroscopic survey, 
we show that the fractional error in the number of galaxies 
in the unknown population that fall in spectroscopic redshift 
bin z can be estimated to the precision 
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where / s k y is the sky coverage of the survey, to is the multi- 
pole at which shot noise becomes equal to intrinsic cluster- 
ing in either sample, and /3(z) is the fraction of the unknown 
auto-power (at multipoles less than to) that arises from red- 
shift bin z. However, the result is even simpler in the limit of 
a sparse spectroscopic sample, having fewer than a thousand 
objects per sq. deg. per Az: 
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where A/"' s ' is the total number of spectra per unit redshift. 
In this 'rare spectroscopic sample' limit, the fractional error 
on N(z) depends on the total number of spectra but not 
separately on the density of spectra, the sky area, or the 
fraction of objects with spectra. 

Angular cross-correlations to determine redshifts have 
applications beyond estimating dN/dz. For example, they 
could be used to measure the redshifts of unresolved cosmic 
infrared background anisotropies (as was done in Kashlinsky 
ct al. 2007) or to isolate foregrounds in cosmic microwave 
background (CMB) and high-redshift 21 cm maps. Angular 
cross-correlations can additionally be used to reconstruct 
three-dimensional correlations from angular clustering mea- 
surements (Seljak 1998; Padmanabhan et al. 2007). Further- 
more, such cross-correlations are able to calibrate photomet- 
ric redshift errors even when the spectroscopic population is 
not intrinsically identical to the unknown population. Ap- 
plications that are not in the vein of precision cosmology 
likely need no better than a 10 per cent fractional constraint 
on dN/dz. However, percent-level or even better calibration 
of photometric redshifts is required to prevent redshift er- 
rors from being the limiting factor for cosmological param- 
eter estimates with the next generation of weak lensing sur- 
veys (Huterer et al. 2006; Schneider et al. 2006; Bernstein 
& Huterer 2010; Zhang et al. 2010). 1 

There are a wide range of surveys to which cross- 
correlation techniques could be applied. Recent spectro- 
scopic surveys have gone wide over hundreds (Driver et al. 
2011) or thousands of square degrees (Eisenstein et al. 2001; 
Colless et al. 2001; Drinkwater et al. 2010; SDSS-III Col- 
laboration et al. 2012a) or deep over ~ 1 sq. deg. patches 
(Le Fevre et al. 2005; Newman et al. 2012). Some are com- 
plete to a magnitude limit, whereas others more sparsely 
sample the sources (Lawrence et al. 1999; Eisenstein et al. 
2001; Kochanek et al. 2012). The large spectroscopic data 
sets that should be available in the next decade include: 2 

1 While photometric redshifts arc object-specific, in practice 
weak lensing studies will likely use the statistical distribution from 
photometric redshifts owing to catastrophic errors (Cunha et al. 
2009; Mandclbaum ct al. 2008). In contrast cross-correlations are 
not able to measure the redshifts of individual objects, but they 
are another way to measure this statistical distribution. 

2 http://www.sdss.org, http://www.gama-survey.org, http: 



• the Baryon Oscillation Spectroscopic Survey (BOSS) 
galaxy sample, covering 10, 000 deg 2 with 1.5 million red- 
shifts of massive galaxies extending to z ~ 0.7 (Dawson 
et al. 2013), 

• the Sloan Digital Sky Survey (SDSS)+BOSS quasar 
sample, covering 10, 000 deg 2 with 2 x 10 5 redshifts (Schnei- 
der et al. 2010; Shen et al. 2011; SDSS-III Collaboration 
et al. 2012b), 

• the Galaxy And Mass Assembly (GAMA) survey, cover- 
ing 310 deg 2 with redshifts for 3.4 x 10 5 galaxies to a z-band 
magnitude limit of 19.8 (Driver et al. 2011), 

• DEEP2 (Newman et al. 2012), the VIMOS-Very Large 
Telescope Deep Survey (VDSS; Le Fevre et al. 2005), the z- 
Cosmology Evolution Survey (zCOSMOS; Lilly et al. 2007) 
and, while not technically spectroscopic, COMBO-17; (Wolf 
et al. 2003), each with ~ 10 4 - 10 5 redshifts in ~ ldeg 2 
fields. 

• the HETDEX survey gathering 10 6 Lya emitting galax- 
ies over 200 deg 2 at 1.8 < z < 3.8 (Hill et al. 2008), 

• 21cm emission line surveys over wide fields with e.g., 
the Australian Square Kilometer Array Pathfinder (ASKAP; 
Johnston et al. 2008), which aims for ~ 10 6 galaxies to z < 
0.43 (Duffy et al. 2012). 

The proposed projects eBOSS and BigBOSS would in- 
crease the number of spectroscopically identified galaxies 
and quasars by an order of magnitude over the existing 
SDSS + BOSS samples (Schlegel et al. 2011). 3 Ultimately 
the Square Kilometer Array (projected for 2020) aims to 
capture a billion galaxies over half the sky (Rawlings et al. 
2004). 

In addition, we are entering a new age of optical 
photometric surveys, with the Kilo Degree Survey (KIDS; 
1,500 deg 2 reaching an i-band magnitude limit of i = 23), 
the Dark Energy Survey (DES; 5, 000 deg 2 to i = 25) and the 
HyperSuprimeCam Project (HSC; 2, 000 deg 2 to i = 26.2) 
all currently gathering data. These surveys 4 will be followed 
in the next decade by Large Synoptic Sky Telescope (LSST), 
which aims to constrain the cosmological model using a 
"gold sample" of galaxies with i < 25.3 over half of the 
sky, and Euclid, which will provide high-resolution images 
of galaxies out to z ~ 2 over 15, 000 deg 2 . While we do not 
model in detail any particular survey, we use the above to 
guide our discussion. 

Fig. 1 shows characteristic number densities with red- 
shift for some of the aforementioned spectroscopic surveys 
as well as for complete surveys to the specified i band limit- 
ing magnitude. For these and ensuing calculations, we have 
parametrized the galaxy redshift probability distribution for 



//deep .ps .uci . edu, http : //cesam. oamp. f r/vvdsproject/, 

http : //archive . eso . org/archive/adp/zC0SM0S/VIM0S_ 
spectroscopy_vl . 0/ 

3 http://www.sdss3.org/future/eboss.php, http://bigboss. 
lbl.gov 

4 http://kids.strw.leidenuniv.nl/, http: //www. 
darkenergysurvey . org, http : //www . nao j . org/Pro j ect s/ 
HSC/HSCPro j ect . html, http : //www . Isst . org/lsst/, 
http : //sci . esa . int/euclid. 
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i < 25.3 
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Figure 1. Shown are the dN/dz of different galaxy populations. 
The dashed curves are for surveys complete up to magnitude lim- 
its of i = 21. 23, and 25.3, calculated via Eq. (3). Also shown are 
estimates for the density of SDSS+BOSS spectroscopic quasars 
and for the future combined density of luminous red galaxies, 
emission line galaxies, and quasars with BigBOSS. The red solid 
curves represent the critical densities for whether a sample is in 
the rare galaxy limit (Section 3.4). 

an i-band magnitude limited sample as 
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z = 0.0417 i - 0.74, 

with a total angular number density of 1.7 x 
10 5+o.3i(i-25) deg -2 (Hoekstra et al. 2006; LSST Sci- 
ence Collaboration et al. 2009, calibrated over the range 
20.5 < i < 25.5; see also Efstathiou et al. 1991; Brainerd 
et al. 1996; Benjamin et al. 2010; Hildebrandt et al. 2012). 

Cross-correlation techniques can also be applied to 
maps in the X-ray such as those made with the X-ray Multi- 
Mirror Mission (XMM-Newton) , in the ultraviolet such as 
with the Galaxy Evolution Explorer (GALEX), and in the 
infrared such as with the Wide field Infrared Survey Ex- 
plorer (WISE) and the Herschel Space Observatory, the mi- 
crowave such as with Atacama Cosmology Telescope (ACT) 
and the South Pole Telescope (SPT), and the radio such 
as with ASKAP. 5 In many of these surveys, their angular 
resolution or depth makes redshift identification using over- 
lapping optical surveys difficult. Cross-correlations offer an 
independent means to gauge redshifts. 

This paper is organized as follows. Section 2 sets up 
the formalism used in this paper and applies it to an ide- 
alized dN/dz problem for illustration. Section 3 provides 

5 http://xmm.esac.esa.int, http://www.galex.caltech.edu, 
http: //wise . ssl .berkeley . edu, sci . esa. int/herschel/, 

http : //www.princeton. edu/act/, http : //pole . uchicago . edu, 
http : //www. atnf . csiro . au/projects/mira/ 



intuition into the mechanics of the optimal estimator and 
discusses what scales contain the bulk of the information, 
setting the ground for the relevant examples discussed in 
Section 4. Section 5 generalizes our Fourier space results to 
configuration space and compares out estimator to the more 
familiar Newman (2008) estimator. Section 6 quantifies the 
estimator biases that result from common simplifying ap- 
proximations. Penultimately, Section 7 shows how the re- 
sults of the previous sections apply to photometric redshift 
calibrations. Finally, Section 8 demonstrates our estimator 
on mock surveys and is followed by our conclusions. We defer 
some technical details and derivations to a series of Appen- 
dices, which are referenced in the text. The numerical calcu- 
lations in this study take a flat ACDM cosmological model 
with n m = 0.27, Q.A = 0.73, h = 0.71, cr 8 = 0.82, n s = 0.96, 
and Q,t = 0.046, consistent with recent measurements (Lar- 
son et al. 2011). We treat the background cosmology as per- 
fectly known in all calculations. Roman indexes {i, j, k} run 
from 1 to some maximum integer whilst Greek indexes start 
from 0, and repeated indexes that do not appear in the same 
quantity are summed. Table 1 provides definitions of some 
commonly appearing symbols. 



2 BASIC FORMALISM 

We begin by introducing our notation and physical model, 
before deriving the most general form for our dN/dz estima- 
tor and applying it to idealized, illustrative examples. Useful 
limits of our expressions are taken in Section 3, where we 
also build intuition for the shape of the estimator. 



2.1 Model and notation 

Initially we will discuss galaxy clustering in the spherical 
harmonic basis as our covariance matrix is maximally sparse 
in this space. We shall write expressions as if the galaxy 
samples cover the full sky, but often finite sky coverage can 
be included by simply multiplying by the sky covering frac- 
tion (/sky)- Section 5.1 generalizes our estimation methods 
to configuration space, while Section 5.3 discusses the gen- 
eralization to finite sky coverage. 

We denote the multipole moments of a 'photometric' 
population of objects with unknown redshifts and a 'spec- 
troscopic' sample in which the redshifts are perfectly known 
as 

p(e,m) = N (p) S {p) (e,m) = N^ p) 6l p) (£,m), (4) 

Si(£,m) = N^ a) Sl s) (l,m), (5) 

respectively. Here, 1 ^ i ^ A^m labels the redshift bin span- 
ning the range z;_i — Zi, where the z; are ordered in in- 
creasing redshift, and 8^ = x/{x) — 1 is the overdensity 
in population x, where x denotes an angular source den- 
sity field with (x) = N^ x ', the mean density per unit area. 
Our calculations are more general than the case of a spec- 
troscopic and photometric galaxy sample: the photometric 
sample can be thought of as any sample for which the red- 
shifts are unknown and the spectroscopic as one for which 
they are known to precision Az/2. Our ultimate aim is to 
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symbol description 



p(£, m) multipole moment of photometric population (subscript i refers to the subsample within redshift bin i) 

s(£, m) vector of multipole moments of spectroscopic z-bins (sj is component in bin i) 

average density per unit area in population x in redshift bin i; iVW = 

dN^ jdz equal to / Azi, where the subscript i is dropped if redshift- independent 

Af^ total number of spectroscopic galaxies per unit redshift in redshift bin i 

w ( x v)(t) stochastic component of the cross power between samples x and y; = w^ xx ^ 

(x) 

b^ linear bias of population x in redshift bin i 

Cij (£) matter density angular cross power spectrum between bins i and j 

A(^, m) covariance matrix of p(£) with s(£) with index referring to p 

£o multipole where shot noise is equal to cosmic variance 

A^in number of redshift bins used in analysis 

D(z) growth factor such that D(0) = 1; Di = D(zi) 

X the conformal distance; d\ = (1 + z) dt 

S(£) the 'Schur parameter' (Eq. 29); S > 1, with equality holding in the rare limit 

f$i{£) fraction of total photometric power contributed by redshift bin i (Eq. 43) 

n local power-law index of the density power spectrum such that P(k) ~ k n 

%L multipole at which linear theory errors at a factor of 2 (Eq. 34) 

Table 1. Definitions of commonly appearing symbols. The arguments are often dropped in the text, and hats on any symbol indicates 
an estimated value. 



use a survey's estimates for the left-hand-side of Eqs. (4) 
and (5), p(£,m) and Si(£,m), to estimate the N^ p \ 

Our discussion will be couched in terms of constrain- 
ing the for which the Azi need to be chosen to be 
sufficiently narrow in order that there are not significant 
gradients in dN^ jdz across the bin. However, in many 
cases, particularly when the sensitivity to cross correlations 
is marginal, a smoother parametrization of dN^ p ' /dz may be 
desirable. Our error estimates can be easily translated into 
the errors on other parameterizations of dN^ jdz (like its 
mean and variance or the empirically motivated parameter- 
ization of a power-law times an exponential; see Appendix 
A3 for more details). 

We model the Si(£,m) as Gaussian random variables 
with auto power spectrum 

(s iSj )(i) = ftW b W Cij{e) + w WjK (6) 

where we have dropped the m dependence as different modes 
are orthogonal by statistical isotropy but have the same 
auto-power. We denote by C\j the cross power between the 
matter overdensity in the i and j slices, and by the linear 
bias of population x in redshift bin i. The expression for the 
shot noise piece v/f^ in the halo model results from taking 
the large-scale limit of the one-halo term (see e.g. Cooray & 
Sheth 2002, for a review): 

w*f y) = r d X f dm h n h (m h ) {nf'nfjmj, (7) 

where n h (m h ) is the halo mass function and (n^ n^\rrih) 
is the number of galaxies of type a: in a halo of mass m times 
that in type y and averaged over all haloes at fixed mass. 
This large-scale limit is a good approximation at the angular 

(x) 

scales we consider. We adopt the simplifying notation wl = 
w ( xx ) -\y e no t e that a measurement of the is not limited 

6 The normalization of the stochastic component can potentially 
be reduced for dense samples by differently weighting sources (Sel- 
jak et al. 2009; Hamaus ct al. 2010) instead of the galaxy number 
weighting used here. 



by sample variance, and it can be perfectly measured in the 
limit that the stochastic component is zero. 

The cross power spectrum of Si{£) and p(£) is 

{p 8i ) (£) = £ bf dj (£) + wt } ■ (8) 

3 = 1 

Finally, 7 

(p 2 m -EE N >T caw + ^ sf] . 

1=1 j=l 

(9) 

We will add to Eqs. (6), (8) and (9) the generally smaller 
terms that owe to cosmic magnification later. 

While our formalism is completely general, subsequent 
calculations (and the figures we present) assume 

b\ x) =D{z i y 1 , (10) 

where D(z) is the linear growth factor normalized so that 
D(0) — 1, and we will interchangeably use \ an d z for its ar- 
gument. This bias leads to redshift-independent clustering, 
appropriate for several cosmological populations, especially 
if they are rare objects. In many instances this assumption 
will be benign, and our results can be simply rescaled by 
fixing N^bj . We also assume 

«4 ps) = /ov Cr minK (s) , Wl (p) ], (12) 

for the stochastic component of the power. We take the 
'overlap fraction' to be /over = 1 unless stated otherwise 
(which means that the rarest mm{N^ s \ N^] sources are 
the same in both samples). In addition, we take a satellite 
fraction of = 0. Increasing to 25 per cent - the 

7 The total linear bias of the photometric sample is b( p ) = 

T,?=l n N ( i p) b<f ) /N<-p\ which provides a potentially helpful intc- 

(v) (p) 

gral constrain to break the b\ — degeneracy. 
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largest fraction found for the relevant galaxies in Wetzel & 
White (2010, see their figs. 8 & 12) - does not change our 
results appreciably. 8 

The cross power in the matter overdensity is 
r°° o 1,2 ju 

CaW = / at{k,Zi)a t {k,Zi)P{k), (13) 

JO T 



ai(k, zi) 



f 

Jo 



d X D( X ) W t ( X )j e (kx), 



(14) 



where, in our top hat JV^ bias, Wi = Ax^ 1 f° r redshifts 
that fall in the range z»_i — z; and zero otherwise. (For a 
discussion of how to evaluate je and these highly-oscillatory 
integrals over ji numerically see Appendix D.) While not 
required, we have assumed linear theory such that P(k) is 
the z — linear-theory matter overdensity power spectrum. 
Eq. (13) ignores redshift space distortions (RSDs). RSDs 
contribute a small fraction to the angular fluctuations on 
relevant angular scales (Appendix B). 

We note that linear scales can only be used to recon- 
struct the product of the large-scale bias, 6^ , and the num- 
ber density, N^ p \ at any redshift (Newman 2008; Bernstein 
& Huterer 2010; Schulz 2010) as they always appear in com- 
bination. This product is sometimes the desired quantity 
(e.g., when cleaning a map of diffuse backgrounds), but for 
many applications it is N- p ^ itself that is desired. We dis- 
cuss methods for breaking this degeneracy in Section 9. We 
will often write our constraints as on for notational 

simplicity, but please note that the constraints we quote are 
always on the combination b\ p ^ N^ p \ 



2.2 Estimator 

To simplify notation, we define the combined covariance ma- 
trix of the photometric survey and the redshift slices of the 
spectroscopic survey: 



p{Z,m)* 
s(£,m)* 



(p(e,m) s{i,m)) 



(15 



where s T — (si , 



s„). The argument (l,m) will typically 
be dropped in subsequent expressions. The minimum vari- 
ance estimator for that maximizes the likelihood func- 
tion if it is Gaussian in this parameter near the maximum 
(as is likely if many modes are included in the estimate) is 



N 



(p) 



= [^ P) ]last + \ IF" 1 ]* Yl 



- Tr[A _1 Aj] 



(16) 
(17) 



(e.g. Bond et al. 1998; Tegmark et al. 1998; Dodelson 2003), 
where all repeated indexes are summed and subscript ', V in- 
dicates a derivative with respect to the i th parameter, which 
for most of our discussion is the parameter N> p ' . One can 



8 In the case of /over = 1 and equal numbers in both the s and 
V samples, both populations trace the same large-scale cosmolog- 

(p) 

ical plus stochastic perturbations and the JV t ' can be perfectly 
estimated. 



also trivially recast this to b^ N- p \ the quantity that is 
truly constrained. Appendix A2 derives Eqs. (16) and (17) 
and shows how they generalize to the case with priors on 



the N> p) . The parameter [N^'ji^t is initially a guess and, 
for subsequent iterations, the previous estimate. In the limit 
that many modes are included in the estimate (which is ap- 
propriate; Appendix Al), 



= i53'It[A- 1 A, i A- 1 A J 



(18) 



and F is the Fisher matrix. The estimator in this limit is the 
minimum variance quadratic estimator, and the variance of 
this estimator is [F -1 ]^ (e.g., Tegmark et al. 1997). We will 
use Eq. (18) in our subsequent calculations. 

Schulz (2010) and Matthews & Newman (2012) consid- 
ered a maximum likelihood estimator approach to constrain 
the N^- 1 , at least for their most general expressions. This ap- 
proach should yield similar estimates to ours as the Fisher 
matrix, which sets our variance, saturates the Rao-Cramer 
bound (and so is optimal). In fact, quadratic estimators are 
prone to find local extrema and so a Markov Chain Monte 
Carlo approach to find the maximum likelihood may yield 
more robust estimates (e.g., Christensen et al. 2001). How- 
ever, the linearity of our estimator reduces the severity of 
this problem, and we show in Section 8 that it robustly finds 
the true minimum even when the initial guess for the 
is off by orders of magnitude. 

It is worth noting two subtleties in our approach: First, 
we do not consider estimators for the that simulta- 

neously estimate the u>l ps \ although this would be a small 
generalization of Eq. (16). Instead, we assume that the 
can be measured independently from the , which should 
hold because of the much different scaling of the cosmolog- 
ical and stochastic components in the (ps»). Larger I can 
also be utilized for the estimate than are useful for 

constraining the N^ p \ Second, our expressions do not con- 
sider the case in which the the true value for AT differs from 
the measured number density owing to large-scale modes on 
the scale of the survey. Such an error will be most impor- 
tant in narrow fields. One can easily take this effect into 
account by using the measured number in a prior on the 
field to field fluctuations and then marginalizing over the 
A^ s) (Appendix A2). 



2.3 Idealized application 

Bearing these caveats in mind, Eq. (18) allows us to es- 
timate the sensitivity of a hypothetical survey. The solid 
curves in Fig. 2 show these estimates for an idealized case 
in which the are equal, have redshift- independent clus- 
tering (see Eq. 10), and span the redshift range — 1 with 
10 redshift bins. The curves represent contours of constant 
sensitivity on the parameter &yn/2^bfi/2 (^ e '' ^ e fractional 
error on the bias times the angular number density of pho- 
tometric objects in the fifth redshift bin) as a function of the 
dN^/dz and dN^/dz used in the cross correlations. The 
labels on the black solid curves are log 10 of the fractional 
error. The solid curves in the right panel of Fig. 3 are the 
same except assuming a survey in which z = — 1 is spanned 
with 100 redshift bins, which approximately results in 



© 2013 RAS, MNRAS 000, 1 



6 M. McQuinn and M. White 




12 3 4 

Log(dN w /dz [deg- 2 ]) 

(v) (v) 

Figure 2. Contours of log 10 of the fractional error in %ln/2'^bin/2 

for an idealized survey where the ' are equal and span z = 0— 1 
with 10 redshift bins of equal width, covering 1 per cent of the sky 
(400 deg 2 ). Contours are labelled for the solid curves, and the cor- 
responding contour for the other curves is the adjacent curve at 
higher number densities. The calculations assume our fiducial pa- 
rameters except /over = 0. (For our fiducial value of /over = 1, the 
curves buckle outward when the number densities become equal.) 
The black solid curves are the full calculation (in the Limber 
approximation, but this matters negligibly for Az 0.1). The 
purple dotted curves show the approximation that sets to zero 
terms in F in which the derivatives hit Aqq. The short dashed 
green is the diagonal approximation to the remaining Fisher ma- 
trix, a limit that also works excellently. The long dashed blue is 
the error on the estimator in the Schur-Limber limit in which 
S-¥l (Section 3.2 and Eq. 35). 



larger errors. The other contours in both figures show dif- 
ferent approximations that are developed in Section 3. All 
of the curves are computed for a fractional sky coverage of 
/sky = 0.01, but the errors scale as / sky for surveys with 
areas 3> ldeg 2 (Section 5.3). 

While the contours in Figs. 2 and 3 are for the simplis- 
tic case of constant dN^ /dz and dN^ /dz, they illustrate 



a few of our results. First, the sensitivity to the sat- 
urates once either the photometric or spectroscopic dN/dz 
becomes larger than the other. Second, the contours show 
that percent-level constraints for Az = 0.1 are possible even 
for number densities of dN (a) /dz ~ dN (p) jdz ~ 10 3 deg -2 
if 10 per cent of the sky is utilized. 

We find that the calculations in Figs. 2 and 3 can 
be crudely applied beyond the assumption of constant 
dN^ /dz, of constant dN^ /dz, or of the redshift at which 
they were computed. For example, if these calculations are 
used to estimate the sensitivity of the LSST gold sam- 
ple, which will have dN^ /dz ~ 10 5 deg -2 over a quar- 
ter of the sky (LSST Science Collaboration et al. 2009), 
one finds that percent-level determinations of the are 
possible in Az ~ 0.1 bins with spectroscopic follow up of 



dN^ /dz ~ 10 3 deg -2 (comparable to the sky density of 
BigBOSS emission line galaxies). This estimate is consis- 
tent with the conclusions of more detailed calculations (Sec- 
tion 4). Also, the LEGACY plus the ongoing BOSS quasar 
samples on SDSS provide a spectroscopic number density of 
dN (s) /dz ~ 10deg~ 2 out to z ^ 2.7 over ~ 10 4 deg~ 2 (with 
double this number density at z ~ 2.3; SDSS-III Collabo- 
ration et al. 2012b). Fig. 2 suggests that cross-correlations 
with denser photometric surveys should provide ~ 10 per 
cent errors on their for / s k y ~ 0.1, again consistent 

with what we find later on. 

We now turn to building intuition for the estimator pre- 
sented in Section 2.2. 



3 APPROXIMATIONS AND SPECIAL CASES 

In this section, we provide an understanding of the shape 
of the contours in Figs. 2 and 3, we discuss which scales 
contribute the A^ p ' estimate, and we provide intuitive for- 
mulae that can be quickly applied to gauge the utility of 
cross correlating different samples. 



3.1 The Limber approximation 

If the theoretical power spectrum is smooth and our signal 
is coming primarily from scales which are small compared 
to the width of each redshift shell, then the Limber approx- 
imation applies (Limber 1953, 1954) and our expressions 
simplify significantly. The Limber approximation assumes 
that P(k±, fey) varies slowly as a function of fey compared to 
je(k»x) ~ which should hold when £ 3> x/ A\i. Making use 
of the identity 



k 2 dk j e (kx)je(kx) = --j^Cx ' 



X 



(19) 



where 8 D is the Dirac delta function and the Limber approx- 
imation, Eq. (13) simplifies dramatically and Cij(£) becomes 
diagonal (Kaiser 1992; White & Hu 2000) 



Cij{t) = St. 



' d X D 2 {x)Wf{x)^H^, 



X 2 Ax« 



(20) 
(21) 



We discuss how the Limber limit is approached and com- 
pute the corrections owing to RSDs in Appendix B (where 
we show that RSDs enter at 0([£ A X /x]~ 2 ), which means 
they contribute negligibly on scales where the Limber ap- 
proximation applies). 

The majority of past studies (Newman 2008; Matthews 
& Newman 2010; Schneider et al. 2006) have used the Lim- 
ber approximation. Fig. 3 shows that this approximation 
provides a good estimate for the variance of our iV ; es- 
timator, with only a small error in the case of Az = 0.1 
(left panel) and the error starting to become significant for 
Az = 0.01 (right panel). In both panels, compare the solid 
contours, which assume Limber, with the dashed contours, 
which do not. The Limber approximation is accurate be- 
cause, as we will show, much of the estimator's constraint 
derives from I where it should hold. (The percent-level bias 
introduced by this approximation is quantified in Section 6.) 
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Log(dN w /dz [deg~ 2 ]) Log(dN (s) /dz [deg~ 2 ]) 

Figure 3. Contours of log lc) of the fractional error in ^n/ 2 ^ r bhl/2 ' n Li m b er approximation (black solid curves) and the full 
calculation without this approximation (blue dashed curves; which for the same fractional error fall immediately upward of the solid 
curves). The contours are calculated for a survey that spans z = — 1 with 10 (left panel) and 100 (right panel) redshift bins of equal 
width over 1 per cent of the sky. Roughly, the errors are \/l0 larger in the right panel than in the left panel. This figure illustrates that 
the Limber approximation works well for the Az = 0.1 case, but is starting to break down at Az = 0.01. While making the Limber 

(v) 

approximation leads to errors in the uncertainty estimate, we show in Section 6 that the bias on is always quite small. 



The covariance matrix of the photometric and spectro- 
scopic surveys simplifies considerably in the Limber approx- 
imation, with only the Ao a terms and the diagonal compo- 
nents of Aij being nonzero, namely 



A 00 

A 0l 
An 



= Y. ' 



(p) at(p) 



(22) 

(23) 
(24) 

(25) 



Furthermore, this A(£, m) can be inverted analytically, yield- 
ing 

S 
A 00 ' 



[A- 



S Aoi 
Aoo An 



with 

S = Aoo 



— + 

An Aoo 



S Aoi.Aoj 

AiiAjj 



Am ' 



AnAj. 



Aoo 



"bin A 2 

A 0i 

1 An 



N hilL 

I> 2 



(26) 
(27) 

(28) 
(29) 



where n{£) = Aoi/(Aoo An) 1 ' 2 is the cross correlation co- 
efficient between p and Si, and again we are using the 
convention i, j £ 1 — A^m- The above inverse can be de- 
rived using the Schur complement matrix identity and the 
Sherman-Morison Woodbury formula (Petersen & Pedersen 
2008, their equations 8.8.3 and 3.2.1). 

The 'Schur parameter', S, is greater than or equal to 
unity and quantifies the extent of correlation between the 



spectroscopic and photometric samples. In the case of com- 
plete redshift overlap of the spectroscopic sample and in the 

(v) 

absence of shot-noise, S — > oo and the are perfectly 

constrained. If the unknown sample is limited by shot-noise, 
or if the two samples cover different redshift ranges, S — > 1 + . 
The implication is that even a small amount of noise dimin- 
ishes considerably the constraining power of a mode. 

In the analytic derivations that follow, we ignore deriva- 
tives that hit the Aoo in Eqs. (16) and (18), as this element 
provides only an integral-like constraint on the . For 
all relevant limits, the approximation of ignoring the Aoo- 
derivatives is excellent: Fig. 2 compares the solid black error 
contours, which include the Aoo-derivatives, with the nearly- 
overlapping dotted purple contours, which do not. With this 
additional simplification, the Limber-approximation Fisher 
matrix (Eq. 18) is 

F tj « Y, (V'WA^loo + [A-'UA" 1 ]^) [AoiU^j 

i, m 



= V-M|^ + 25 

71 Aqo \ A " 



[A 0t },i [Aojjj. (30) 



Furthermore, the minimum variance quadratic estima- 
tor becomes 



(p) 



*■ — ' Ann An 



(31) 



where repeated indices that do not appear in same quantity 
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■ • ■ rare-rare — many-rare 
many-many - - rare -many 




I 

Figure 4. An illustration of the scales that contribute to the con- 

(p) 

straint on JV| ' in different limits. The areas under these curves, 
which are of d[l/F~T 1 ]/cilog^, are proportional to the informa- 
tion that contributes to the estimate in the i = 6 bin for a 
measurement in 10 redshift bins with Az = 0.1 and spanning 
< z < 1. For illustrative purposes we have assumed constant 
dN^ /dz and dN (s ~> /dz. The first adjective for each curve's la- 
bel in the key describes the spectroscopic sample (rare=10 deg — 2 
and many=10 5 deg — 2 ), and the second describes the photomet- 
ric sample (rare=100 deg - 2 and many=10 6 deg -2 ). However, the 
curves are not significantly impacted at linear scales by the as- 
sumed densities as long as 'many' equates to > 10 4 deg -2 and 
'rare' to < 10 3 deg - 2 , with the exception being the many-many 
case. In the text we describe why these limits select the scales 
that they do. The vertical lines denote significant scales discussed 
in the text. The thin red dot-dashed curve does not assume the 
Limber approximation whereas the corresponding thick curve as- 
sumes it. 

are summed. (The complete Limber estimator, where [.Aoo],i 
terms are maintained, is given in Appendix AT) 9 

Figs. 4, 5 and 6 motivate why the approximations of 
Limber and linear theory are justified. Fig. 4 shows the scales 
that contribute to the estimator for several different cases, 
plotting d[l/F^ x ]/dlogl The areas under these curves are 
proportional to the information that contributes to the es- 
timate in the i = 6 bin for a measurement in 10 redshift 
bins with Az — 0.1 spanning < z < 1. The first adjec- 
tive for each curve's label in the key describes the spectro- 
scopic sample (rare=10 deg -2 per dz and many=10 5 deg -2 
per dz) and the second describes the photometric sample 
(rare=100 deg -2 per dz and many=10 6 deg -2 per dz), where 
these number densities are assumed constant with redshift 
for illustration. This figure indicates that (at least for these 

9 Our Limber "Fisher Matrix" that drops the off-diagonal terms 
can violate the Rao-Cramer bound, as can be noted in Fig. 2: 
The purple dotted contours are not above the black solid con- 
tours (which saturate the Rao-Cramer bound for our problem) 
at all number densities, falling just slightly below at the largest 
dN^ /dz. This is not an issue for our purposes. 




Figure 5. Shown are Cu for z, = 1, Azi = 0.1, and our fiducial 
bias model. The Cu are calculated under various approximations 
- linear theory (dashed curve) and the Limber approximation 
(solid curves) - and for the full Peacock & Dodds (1996) nonlinear 
power spectrum (dotted curve). Also depicted are the stochastic 
component of the power for two characteristic number densities 
and / sa t = (horizontal lines). The auto-power of spectroscopic 
bin i, (sf), equals Cu plus the stochastic component. The optimal 
quadratic estimator selects information that roughly falls in the 
range of the two vertical lines (Section 3), between where the P(k) 
roughly scales as k~ 1 and fc -2 . Conveniently, both linear theory 
and the Limber approximation apply around these scales. 



extremities of the parameter space) the bulk of the infor- 
mation derives from modes around where the density power 
spectrum has power-law index —2 and —1, lpk-2 and ipk-i, 
respectively. As we shall discuss further, correlations be- 
tween two rare samples (where rare is defined as having 
to i$ tpk-i) constrain primarily from multipoles with 
i ~ ipk-i- Rare and abundant samples use multiples with 
i ~ ipk-2, which also holds in the case in which both sam- 
ples are extremely abundant. We show later that it is pos- 
sible in less extreme examples in which both samples are 
relatively abundant for the information to derive primarily 
from the scale Iq. 

To orient the reader, Fig. 5 shows estimates for the Cu 
at z = 1 and for Az — 0.1 that use linear theory, the Limber 
approximation, and the Peacock & Dodds (1996) nonlinear 
power spectrum. The vertical lines show tpk-2 and tpk-i- 
to is the scale at which the (horizontal) stochastic power 
becomes equal to the Cu, i-e. where the dotted (red) lines 
equal the black curve. We show the stochastic terms for 
two illustrative number densities. In particular, the upper 
horizontal line in Fig. 5 is the lowest number density at 
which Wj? > [b^ ] 2 Cu is satisfied at all £, which we 

denote as [t^]" 1 ', where 
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1000 * 




100 



1 1.5 
redshift 

Figure 6. Shown are characteristic £ values for cross-correlation 
analyses as a function of z. The lower (red) shaded region de- 
lineates I ^ 2x(z)/Ax for Az = 0.05, approximately where the 
Limber approximation errors are tens of percent (Appendix B; 
Eq. B7). The upper (blue) shaded region is where deviations 
from linear perturbation theory are a factor of ^ 2. The other 
curves show characteristic scales at which dN^ /dz estimates 
receive the bulk of their information. The dashed curves show 
the multipole where the Poisson term is equal to the cluster- 
ing term, which we denote as £o, for surveys with number den- 
sities of b 2 dN/dz = 10 2 , 10 3 , 10 4 , and 10 5 deg" 2 . The magenta 
curves are the scales where the density power spectrum has power- 
law index —2 and —1, tpu-2 an d tpk— li respectively. Correla- 
tions between two rare samples (where rare is defined as having 
^0 ^ ^Pk — i) utilize modes with £ ~ £p^_\ to constrain dN^ /dz. 
However, rare and abundant samples use modes with I ~ £ph—2-, 
whereas if both samples are abundant the information comes from 
£ ~ £q in certain cases. 



Eq. 32 uses the Limber approximation, takes f s 



O) 



0, and 



approximates the redshift dependence as a power-law eval- 
uated at z = 1. In addition, the lower horizontal line is the 
number density at which wf^ = A^'] 2 Cn(£pu-2), or 



dN 



dz 



8000 b~ 



l + z 



deg" 



(33) 



Both critical number densities are shown in Fig. 5 for our 
fiducial bias model. We return to the significance of these 
numbers in future sections. 

We often will approximate the scale at which linear the- 
ory no longer holds as 



fc NL ~ 0.25 (l + z) Mpc" 1 , 



(34) 



which we find is close to the scale in which the Peacock & 
Dodds (1996) nonlinear density power spectrum overshoots 
linear theory by a factor of 2 for the redshifts of interest. We 
define £nl = X^nl, which is plotted in Figs. 4 and 6 and 
throughout as the limit of validity of our assumptions. Fig. 
6 shows that £o falls in the range in which both linear theory 
and the Limber approximation more or less apply across all 



k 


"eff 


k 


"■eff 


A: 


"eff 


0.01 


0.05 


0.1 


-1.7 


1 


-2.4 


0.02 


-0.7 


0.2 


-2.0 


2 


-2.5 


0.05 


-1.3 


0.5 


-2.3 


5 


-2.7 



Table 2. The instantaneous power-law slope of the linear theory 
power spectrum as a function of wavenumber, fc, in Mpc -1 (n e g = 
d log P/d log A;). The values were computed using the Eisenstein 
&c Hu (1998) matter transfer function without baryon acoustic 
features. 



relevant redshifts and number densities. Linear theory also 
applies for £pk-i and £p^-2- We note that £pk-i [£pk~2] 
corresponds to a transverse physical scale of k ~ 0.03 Mpc -1 
[k = 0.2 Mpc" 1 ] (Table 2). 



3.2 The Schur-Limber limit 

We now investigate the above Limber-approximation esti- 
mator in the limit S(£) — > 1 + and show that a small tweak 
to this limit captures almost all of the information in the 
general case. We refer to the S — ¥ 1 + limit as the 'Schur 
limit' henceforth. In this limit the information originates 
from modes where J2i r i ^ 1; either because of incomplete 
overlap of the spectroscopic survey or because shot noise 
is important. In many interesting cases this limit at least 
marginally holds. Importantly, both A and F are diagonal 
in the Schur limit, viz 



E 



[ A 0i\ 2 i cK 



Aoo An 



(35) 



where the superscript S denotes the Schur limit. Further- 
more, the estimator becomes 



N. 



+ 



[A 0l ],, 

Aon Ai 



{psi~A 0i }, (36) 



such that the number density in each bin is now estimated 
independently and is proportional to the cross-power, psii, 
minus a constant. The Schur-Limit approximation yields the 
long-dashed blue curves for the errors on the shown 
in Fig. 2. These trace the contours in the full calculation 
(compare with the solid contours) at dN/dz < 10 3 deg -2 , 
but begin to deviate if both samples have higher number 
densities, as is expected. 

Two notes in passing: (1) The structure of F' s is remi- 
niscent of the optimal weight in the Feldman et al. (1994) 
definition of the effective volume. While our expression is in 
harmonic space, the structure has the form [nP/(l + nP)] 2 
just as in Feldman et al. (1994). This is not surprising as 
our estimator is asking a similar question to 'What is the 
significance that the cross power can be detected?" (2) The 
Schur-Limber estimator is exact in the limits where Limber 
holds and 5 = 1, and does not require dropping certain 
derivative terms as was required to derive Eq. (31). 

To see how the Schur-Limber estimator works, we take 
the case in which a single £, m mode contributes to the esti- 
mate such that 



N M = [Aghast + 



(p)l 



pSi 



(37) 
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If the true N- p> differs from the fiducial model, [iV^'jiast, by sample that comes from z-bin i 



5N^ , we have the relations 



psi = (V< p) ] last + *JV<">) NPbVbWctt** + 4 P3 \ (38) 

where Cf^ ta " is the actual density power in this harmonic, 
and 

A 0i = [iV< p) ] last x N^b^b^Cu + w^. (39) 
Plugging these into Eq. (37) yields 



N 



(p) 



(p) 



N 



(p) 



(40) 



noting that (C£ ata (£, m)) = Cu. Thus, the iteration con- 
verges in a single step, and the estimate is unchanged with 
subsequent iterations. The former is no longer the case when 
multiple £ are used in the estimate, but we show that the 
estimator still converges in just a few iterations in Section 8. 

The structure of the formula for the Fisher matrix in 
this Schur limit (Eq. 35) is also quite simple, and is most 
easily brought out by considering the case where the under- 
lying power spectrum is a power-law, Cu = Ci£ n (we review 
some of the theory of power-law power spectra in Appendix 
E), 



r (p) r M f 2n cK 



( c (p) ^n +w (p))( c W£n +w W) 



, (41) 



where we have written cf ] = [N^bf } ] 2 a, c (p) = E,c, (p) . 
The CDM case is similar, except that the spectrum has a 
power-law index which becomes increasingly negative to- 
wards smaller scales (see Table 2). Eq. (41) provides intu- 
ition into the shape of the contours in Fig. 2 as we shall now 
discuss. 



3.3 Abundant galaxy limit 

At £ where neither the photometric nor the spectroscopic 
survey is limited by shot noise, all £ contribute equally and 
the argument in the sum in Eq. (41) is roughly constant in 
I. However, once shot noise becomes appreciable for either 
survey {£ > £o), the argument in the sum scales as £ n . At 
scales where n < — 2, which becomes increasingly satisfied at 
smaller scales with CDM spectra (see Table 2), this scaling 
cuts off the sum as shells of increasing £ contribute progres- 
sively less to F. If n > —2, this is not true, and there is 
information until scales where both surveys are limited by 
shot noise (or n has steepened). This explanation is reflected 
by the contours in Fig. 2. For number densities where £o oc- 
curs at scales at which n < —2 (dN/dz > 8000 b~ 2 deg -2 ), 
information is gained all the way until £ ~ £q. In this case, 
the contours are very boxy and Eq. (35) can be approxi- 
mated as being clustering dominated at £ < to and being 
at £ > £ : 



SN, 



(p) 



N 



(p) 



n (p) 



((ft) /sky [f 



■i 2 ■ X) 



-1/2 



(42) 



where £ m - lu is the minimum wavenumber used, and (/?,) is the 
i"-averaged fraction of the angular power in the photometric 



[N^b[ p) ] 2 Cii(e,m) 



(43) 



For the simple case of slices of fixed number and distant ob- 
servers (i.e., x n °t changing appreciably across the sample), 
(j3) ~ The left panel in Fig. 7 shows how the sensitivity 

is increased with increasing dN^ /dz, fixing the photomet- 
ric population (here a survey complete to i = 23) and the 
survey area. It shows that the prediction of Eq. (43) of a 
number density-independent error comes into full effect at 
dN^/dz > 10 5 deg- 2 , which is on par with the maximum 
number densities that for medium-future experiments (see 
Fig. 1). Values of the Schur parameter greater than unity 
(Eq. 42 sets S = 1) result in some number density depen- 
dence even at high dN^ /dz. 10 Also, evaluating Eqn. (42) 
for parameters that match the case given in the left panel 
of Fig. 7 - £ = 2000 (see Fig. 6), j3 = 0.1, and 100 deg 2 - 
yields 5N/N = 0.03, which is comparable to the values for 
the largest dN^' jdz in this plot. 

We have used linear theory in our computations, but 
scales with £ > Inl should not be used in our formal- 
ism. This nonlinear cutoff allows us to roughly estimate the 
smallest patch of sky for which cross-correlations will yield 
a useful constraint on dN^ p ' jdz. Evaluating Eq. (42) with 
£o — > £~nl ~ 10 3 implies that a square degree is required for 
cross-correlations to provide an 0(1) constraint on dN M /dz 
with our method. When accounting for S > 1, the constraint 
can improve by a factor of ~ 2 for physically realizable num- 
ber densities (as can be gleaned by comparing the Schur es- 
timator's error - the long-dashed blue curve - to the full 
estimator's error - the solid black curve - at high densities 
in Fig. 2). 



3.4 Rare spectroscopic sample 

Another relevant limit of the Schur-Limber estimator is 
when the spectroscopic sample is sparse enough that it is 
dominated by shot noise. In this limit, the Schur approxima- 
tion (S ~ 1) is always justified, and our equations simplify 
further so that the Fisher matrix becomes 



dp) M 



EM p) N { k p) rc kk+w ^ 



(p) 



«iV< s) / s ky, (44) 



for = 0. Thus, in this limit the error on the scales 
as the total number of spectra - it does not depend on the 
density of spectroscopic sources. It turns out that in many 
relevant cases cross-correlations will be in this regime (as 
discussed in Section 4). 

What dN (s '/dz are required to be in the rare limit? 
If dN^ /dz < [7^7] y"\ or roughly a hundred per square 
degree (Eq. 32), the sparse tracer limit certainly holds as 



10 In fact, Eq. (42) should be regarded as an upper bound on 
the error since we set S = 1. When S is large (and here we take 
mk s ' > w\ and w\ > w f , although similar conclusions apply 
regardless), S cc J^il^i* 1 Cu/w^ s \ Including S in the summation 
in Eq. (41) makes the kernel peak at £p k —2 f° r high number den- 
sities rather than £0 . This results in the many-many case peaking 
at £pk-2 in Fig- 4. 
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Figure 7. Illustration of how the fractional constraints on the bf depend on area, total number, and densities of the samples. 

All panels take Az = 0.05, a photometric sample down to a limiting mag nitude of i(*0 = 23 (and 100 per cent complete except in the 
right panel), and a spectroscopic sample for which dN^ /dz is a constant to z = 2. The p(z\i) of the photometric sample is given by 
the thick solid curve. Left panel: The curves assume a 100 deg 2 survey and the specified <i/V' s ) /dz. At lower dN^ /dz the sensitivity 
improves as the square root of d/V' s ) /dz, as anticipated in the rare-spectra limit, but at high densities the sensitivity does not depend on 
depth, as anticipated by our abundant limit. Middle panel: The three curves show a spectroscopic sample with fixed total of 10 5 galaxies 
and the specified sky densities. The similarity of the sensitivity between these much different densities demonstrates our analytic result 
that in the rare tracer limit the fractional error scales as the total number of spectroscopic galaxies. Right panel: Varying the fraction, 
/, of photometric galaxies that are used with a spectroscopic sample with angular density 10 deg -2 , and 10 5 spectroscopic galaxies. In 
the limit in which both the photometric and spectroscopic samples are in the rare limit, the fractional sensitivity scales as / _1//2 . This 
addresses how well the distribution of an "interesting subsample" of objects could be determined. 



the shot component always dominates. However, for even 
much larger number densities, the rare spectroscopic limit 
is a good approximation. The Fisher information at each I 
for a rare spectroscopic sample (but abundant photometric 
sample) keeps increasing until lpk-2 (as di^f /d log £ oc £ n+2 
so that the contribution to decreases in bins of log£ 
once n < —2). Thus, to be in the rare limit, it is less im- 
portant that shot noise dominate at all i and most impor- 
tant that shot noise dominates by lpk-2- Therefore, once 

dN^ /dz < [dN/dzfl^ (see Eq. 33) the rare limit applies, 

(p) 

and the constraint on the iV 4 only depends on the total 
number of galaxies. 

The middle panel in Fig. 7 tests this argument that 
densities less than [dN/dz]^ are m the rare limit. It plots 
the constraints on of for a photometric sample down 

to a limiting magnitude of i = 23, assuming Az = 0.05. 
The three curves each take a spectroscopic sample comprised 
of 10 s galaxies and differing d7V (s) /dz, where dN {s) /dz is 
taken to be constant up to z — 2 as specified in the figure 
key. Thus, the three curves represent surveys with the same 
number of spectroscopic galaxies. The sensitivity changes 
negligibly with number until 10 4 deg -2 . 

The middle panel in Fig. 7, combined with our argu- 
ment that SN^ oc [M i i a) \~ 1/2 , where J\f- a) is the total num- 
ber of spectroscopic galaxies per unit redshift, suggests that 
a minimum of ~ 10 3 spectroscopic galaxies are needed to 
have an order unity constraint on b^N^ (and somewhat 
fewer if the population is more localized in redshift than in 
our example or if they are more strongly clustered than in 
our fiducial model). That ~ 10 3 spectroscopic galaxies are 
required is also apparent from evaluating Eq. (44) in the lim- 
its of a abundant photometric and rare spectroscopic survey, 



which yields 

SN^ „ _a^_ ( A£> <A)c \ " /2 (l + z\ - ' 5 

N (p) ~ b M D . y i 3 o.i J \ 2 J ' 1 ; 

where we have assumed bins of fixed Az, {Pi)c is defined 
analogously to {Pi) but weighted by Cu, and the redshift 
factor owes to how lengths map to angles and redshift inter- 
vals with z (which we evaluated at z = 1, but we find that 
this formula holds to 20 per cent for 0.1 < z < 3). 

3.5 Rare-rare limit 

The final limit we consider is when the fluctuations in 
both samples are dominated by shot noise. In this limit, 
dFfi/dlogl oc ^ 2n + 2 such that the contribution to F£ 
decreases in bins of \og£ once n < — 1. As with the 
abundant-rare limit previously considered, we can also eval- 
uate Eq. (35) in the rare-rare limit, which yields 

SN™ ^ 1.7 ( K (S) dN^/dz fA ~ V2 f l+£\°' 4 
N(p) ~ b W b (p) D 2 I 10 3 10 2 deg- 2 0.1 J V 2 / 

(46) 

where is the fraction of the photometric galaxies in red- 
shift bin i (and equals the distant observer Pi in the case 
of redshift independent clustering). This expression shows 
that AT (s) x dN^ /dz > 10 6 deg~ 2 is required for cross- 
correlations to be fruitful. The right panel in Fig. 7 shows 
the constraints on the JVj , again with the specifications 
i' p ) = 23 and 10 5 total spectroscopic galaxies, but taking 
dN^ aS> /dz = 10 deg -2 for all the curves and assuming that 
only a fraction, /, of photometric galaxies are used in the 
cross correlations. When both the photometric and spectro- 
scopic galaxies are in the rare limit, Eq. (46) shows that the 
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sensitivity scales as /~ 1//2 . We note that the peak of dN/dz 
for a survey complete to i — 23 equals 5 x 10 4 deg~ 2 , so 
the / < 0.01 curves should be in this limit, and we indeed 
find this scaling in this regime. This panel illustrates that 
cross-correlations can be used to constrain the redshift dis- 
tribution of peculiar objects, comprising a part in 10 3 of the 
photometric sample in the case shown, and not just the full 
sample. 

The derivations that led to Eq. (46) implicitly assumed 
that the bias of the spectroscopic sample is known from 
auto-correlation function measurements. However, in the 
limit of a rare spectroscopic sample, the auto correlations 
can be much noisier than the cross correlations, calling into 
question this assumption. We show in Appendix A2 that 
in this case the fractional variance on the A/^ is simply 
the fractional variance quoted in this section added to the 
fractional variance in the bias measurement. 

Because the two limits given by Eqs. (45) and (46) 
yield similar 8N(z)/N(z) at the transition between the two 
regimes (at dN (p) /dz ~ 0.1 [dN M /dz] c I$), the sensitivity of 
an arbitrary photometric survey can be obtained by inter- 
polating between them. 



3.6 Generalizing the Schur Limit 

We showed that in the Schur-Limber limit the Fisher ma- 
trix is diagonal. However, empirically we find that the off- 
diagonal terms in Eq. (30) are generally unimportant for 
determining the uncertainty in the N- : The dashed green 
contours in Fig. 2 show the impact of dropping the off- 
diagonal term in Eq. (30), but keeping the S factors so that 
F(£) = S F s (£) and similarly S appears in the summation in 
the estimator. This "generalized" estimator is unbiased. Em- 
pirically, this approximation (the short-dashed green curves) 
differs from the full Fisher Matrix (the solid black contours) 
by a factor of ~ 1 - iVj^J » 0.9. 

There are several idealized cases where one can show 
that dropping the off-diagonal elements of F is a good ap- 
proximation to ~ 1 — N^ in (such as the case of a single £), 
but only our numerical results can be used to justify the re- 
sult in general. The approximation of ignoring off diagonals 
when computing the estimator variance from F is equiva- 
lent to not marginalizing over parameters other than JVj . 
That F _1 is approximately diagonal thus means that one 
does not have to simultaneously estimate each of the [N± ] 
and rather can estimate each parameter independently for 



500 



[N" { ' ]iast near the peak likelihood. 



4 APPLICATIONS 

The previous section built intuition for the behavior of, and 
relevant scales for, the estimator. To bring out the appropri- 
ate limits we considered simple dN/dz distributions, such as 
constants. This section considers more physically motivated 
parameterizations for the galactic populations. Fig. 8 is anal- 
ogous to Fig. 4 but quantifies the scales that contribute to 
the constraint on the for realistic source models, plot- 
ting d[l/F^ j 1 ]/d\og£. In particular, Fig. 8 considers the mod- 
els: 




fe; 10 



10 z 



10 J 



Figure 8. Similar to Fig. 4, but for more physical cases. Plotted is 
the information as a function of £, d[l/^]/d\ogt. The variance 
(p) 

in is the inverse of the area under these curves. The filled 

circles show £^^ = X^NL ( c -f, Eq. 34). The top (bottom) panel 
considers a 40 deg 2 ( 10 4 deg 2 ) survey and takes bins of Az = 0.05 
spanning < z < 2.5. 



top panel: i^ 3 ' = 23 over 40 deg 2 , and 
teristic of the LSST gold sample, 
bottom pane I: dN ls) /dz = lOdeg" 



(p) 



25.3, charac- 



2 over 10 4 deg 2 and < 



(p) 



z < 2.5, characteristic of SDSS quasars, and again i 
25.3. 

In the model in the bottom panel, the kernel peaks near the 
scale lpk-2, which corresponds to I = 400, 700 and 900 at 
z = 0.5, 1, and 1.5. This is as expected when at least one 
sample is abundant. In the model in the top panel, the in- 
formation has a broad peak that falls between lpk-2 and £q, 
where £ = 800, 2000, and 3000 for the three redshifts con- 
sidered. This is consistent with our arguments for the case 
of two abundant samples. In both of the models considered 
in Fig. 8, the majority of the information arises from linear 
scales (scales which fall leftward of the filled dot, represent- 
ing on each curve). We find similar conclusions apply 
for a range of models. 

Fig. 9 investigates the tradeoffs of depth versus area 
for attempts to constrain the in 50 redshift bins with 
Az = 0.05 and spanning < z < 2.5. The top panel is for a 
photometric sample with the specifications of the LSST gold 
sample (which has dN^ jdz > 10 4 deg -2 over the entire red- 
shift range) and for three spectroscopic samples that could 
be obtained with the same total time on a telescope. (More 
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Figure 9. Shown are estimates for the sensitivity to reconstruct 
redshifts of a photometric sample in redshift bins of Az = 0.05 
and spanning < z < 2.5. The top panel is for a photometric sam- 
ple with the specifications of the LSST gold sample (j( p ) = 25.3; 
see text) and for different spectroscopic samples that could be 
obtained for the same total telescope time: The spectroscopic 



followup covers ldeg 2 to iM = 25, 40dcg 2 to i 



23. 



l,600deg 2 to i 



21. The middle panel is similar to the top 



panel but assumes f 3 of galaxies to i^"' 1 = 23 are observed over 
a region of 40 /i" 1 deg 2 . The bottom panel is for a spectroscopic 
sample with the specifications of BigBOSS and the specified limit- 
ing photometric magnitudes. This panel assumes that the surveys' 
overlap is 10 4 deg 2 , but the quoted error scales as the square root 
of the survey area. In both panels, the dashed curves use all I 
values, whereas the solid exclude information from t > ^nl- The 
dot-dashed curves (shown only for the i = 23 case) in the top and 
bottom panels are the variance of the Newman-analog estimator 
without any cutoff at nonlinear scales. 



correctly, the limiting flux squared divided by the survey 
area is held constant.) We assume that the spectroscopic fol- 
lowup covers 40 deg 2 at i' 3 ' = 23. Hence, it covers 1, 600 deg 2 
at i (s) = 21 and 1.0 deg 2 at i (s) = 25. This panel illustrates 
that deeper is not necessarily better (compare only the solid 
curves for the time being). This conclusion arises because 
the spectroscopic galaxies are more or less in the abundant 
limit (particularly near their peak in dN^ /dz) where the 
fractional error does not depend on depth and instead scales 

a factor of 6 between 
over predicts 



what is seen in this panel. This arises because these samples 
are only marginally in the dN/dz regime where we find this 
scaling holds (Fig. 7) and also because S > 1 breaks this 
scaling, which was derived in the S = 1 limit. The i (s) = 21 
sample is in the rare sample limit at the highest redshifts 
shown, and hence its errors blow up there. By contrast, while 
the i' s ' = 25 sample is the least sensitive to dN&'/dz at in- 
termediate redshifts (owing to its small /sky), it is the most 
able to determine the distribution at the highest redshifts. 

The middle panel in Fig. 9 is similar to the top panel but 
assumes a random fraction, f s , of all galaxies with = 23 
are observed over a region of 40 / s _1 deg 2 such that the total 
number of galaxies is fixed. This panel reenforces our result 
that the constraint on the depends primarily on the 

total number of spectroscopic galaxies and not their angular- 
density, even though the case with f s = 1 is in our abundant 
limit in which we no longer expect this scaling to hold. We 
still find this result approximately holds. 

The bottom panel in Fig. 9 shows the case of a spec- 
troscopic sample with the specifications of BigBOSS (whose 
dN/dz is shown in Fig. 1) and the specified limiting photo- 
metric magnitudes. 11 This panel assumes that the surveys' 
overlap is 10 4 deg 2 , but the error scales as the square root 
of the overlapping area. Despite the lower number densities 
of galaxies in this case compared to in the top panel, Big- 
BOSS has a total number of galaxies that exceeds the other 
cases by more than an order of magnitude and, thus, is the 
most sensitive of all the cross-correlation examples consid- 
ered in Fig. 9. These sensitivity predictions are in qualitative 
agreement with our previous estimate for the rare case: eval- 
uating Eq. (45) for A/" (s) = 10 7 galaxies and /3 = 0.1 yields 
0.01, in qualitative agreement with the sensitivity minimum 
in Fig. 9. We note that to reach the 10 -2 sensitivity quoted 
here, BigBOSS would likely need to correct for magnification 
bias (which is discussed in Section 6). 

Omitting nonlinear scales or introducing a redshift cut- 
off in the spectroscopic coverage has little impact on our re- 
sults. The dashed curves in Fig. 9 include information from 
i > ^nl, whereas the solid curves do not. Excluding non- 
linear modes in the analysis has only a modest impact on 
the estimator, except in the i' s ' — 25 case in the top panel, 
where the constraint is reduced by a factor of 3. This case 
is most impacted because (1) its io falls at the most nonlin- 
ear scales of the cases plotted and (2) the small 1 deg field 
assumed in this case has already limited the scales that can 
contribute. Similar losses for each of the plotted case also 
occur for a factor of 2 smaller In addition, we have 

assumed the spectroscopic sample spans the entire redshift 
range of the photometric sample. A cutoff in the coverage 
of a spectroscopic sample, as could occur if an emission line 
falls out of the spectroscopic band of a survey, has little im- 
pact on our results below that cutoff. It has no impact to 
the extent that S = 1, which more or less holds for all cases 
shown in Fig. 9, with the largest differences occurring for the 
deepest case considered in the top panel: when the additional 
condition dN^ s '/dz = was imposed for z > 1.5, which 



85 /sVy- However, the scaling 
the three cases considered in the top panel 



11 BigBOSS aims for a combined dN/dz that we crudely 
parametrize as 30 X 10 2 1z deg 2 for z < 1.0 and 4000 X 
10 

bigboss.lbl.gov 



l.l (z ^deg 2 , to approximate what is quoted at http:// 
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forces S to be small, we found no change to the = 21 
case in the top panel, but a factor of 2.5 shift upward for 
i (s) =25. 

The photometric sample can often be divided into mag- 
nitude bins or into photometric redshift bins. For magnitude 
cuts, extra sensitivity is often gained by dividing the primary 
photometric sample because galaxies in different magnitude 
bins are more likely to also be at different redshifts (as sug- 
gested by the top panel in Fig. 9). In particular, in the rare 
spectroscopic galaxy limit but where the photometric galax- 
ies are more abundant than [dN / dz}^ , the signal scales in- 
versely with the redshift extent of the photometric sample 
and does not depend on the dN^ /dz (Eq. 45). Thus, the 
sensitivity is not improved by going deeper. The redshift 
distribution of galaxies given by our parameterization for 
P(z\i) (Eq. 3) has mean 3zo and variance 3z 2 . Because the 
variance of P(z\i) increases with depth, deeper surveys will 
be somewhat less sensitive at the peak of P(z\i) unless the 
sample is partitioned. 12 A partitioned sample can be eas- 
ily accommodated in our quadratic estimator formalism. In 
Section 7, we discuss the gains from dividing by photometric 
redshift. 



5 CONFIGURATION SPACE 

The previous derivations were done in spherical harmonic 
space as this is the simplest basis for calculating the mini- 
mum variance estimator. However, when dealing with actual 
data it can be more difficult to work with spherical harmon- 
ics as the survey window function enters nontrivially in con- 
volution. Hence many galaxy clustering analyses are done in 
configuration space. In this section we show that the min- 
imum variance estimator can be easily applied in this dual 
space (Section 5.1), we compare with previous configuration 
space dN/dz estimators (Section 5.2), and finally discuss the 
impact of finite sky coverage (Section 5.3) 



5.1 Configuration space estimator 

The harmonic space quadratic estimator can be written in 
the form 



Vi(£) p{£, m)*Si(£, m), 



(47) 



for some Vi(£), plus analogous terms proportional to the 
auto correlations. Writing psi(£,m) — J dn p's i (n)Yl n ( : n), 
Eq. (47) becomes 



/ 



dndn p{n)vi{n ■ n')si(n), 



(48) 



where we have used the addition theorem for spherical har- 
monics (Abramowitz & Stegun 1972), Pg, is the Legendre 
polynomial of order £, and 



v i (x) = '£^±±v i (£)P e (x) 



(49) 



12 This statement holds as long as dN&l /dz > [dN/dz]°I$. This 
inequality is satisfied near the peak of P(z\i) down to the lowest 
magnitudes for which Eq. (3) is calibrated, i = 20.5 (see Fig. 1). 



If we define uj pSi (x) = {p s;}^, as the correlation function 
estimate where x = n ■ n and (. . .) x represents an average 
over all separation angles x in the survey, Eq. (48) can be 
re-expressed as 



87r 2 j dx Vi(x)u} pSi (x). 



(50) 



Thus, the configuration space estimator in the Schur- 
Limber limit is 



" ii 

a 

x < Q pSi (x a ) - co pSi (x a ) >, (51) 



where a runs over the bins in (cosine of the) angle. A simi- 
lar configuration space estimator can be written for the full 
minimum variance quadratic estimator (Eq. 16). 

For 9 <C 1 radian (the scales that we will show are of pri- 
mary interest), the result can be further simplified by mak- 
ing the flat sky approximation. Then, the Parseval identity, 
Jd 2 £ v*(£)ps(£)/(2tt) 2 = fd 2 9 v(9)ps(9), can be directly 
applied to Eq. (47) to yield Eq. (51) with Ax a -> a A9 a 
and 



r°° idi 

<9)= / — v l {£)J {£9). 
Jo 2n 



(52) 



The same expression can be derived from Eq. (49) by writing 
the small-angle limit of Pi in terms of Jo (Abramowitz & 
Stegun 1972). 

We note that in the Schur-Limber limit 



Vi{£) 



(53) 

The thick curves in the top panel in Fig. 10 show the 
flat sky weighting kernel for the same example surveys as in 
Fig. 4, down weighting nonlinear modes by multiplying Vi (£) 
by the factor exp[— £ 2 /£%n]. These calculations show that if 
any sample is in the abundant limit, the window peaks at 
9 ~ 0.1 deg separations, whereas if both surveys are in the 
rare limit the peak occurs at 6 ~ 1 deg. Both cases have 
non-negligible weight at super-degree scales. 

The bottom panel in Fig. 10 shows 6vi{9) x uj pSi) which 
better represents the 9 that contribute to the final estimate. 
Since measured correlations are weaker on large scales than 
small, the 9 > 1 deg behavior of v% (9) is down- weighted and 
really only sub-degree scales contribute significantly. 

In practice, whether weights are applied during or after 
the computation of the correlation function depends on the 
survey to which cross-correlations are applied. In the case 
where the survey's contiguous area is much larger than the 
kernel of Vi(x) (^> 0.1—1 deg), the exact details of the survey 
window are irrelevant. The u) pSi (9) can be estimated with 
standard techniques (e.g. Landy & Szalay 1993; Hamilton 
1993; Bernstein 1994) and then multiplied by the approx- 
imate Vi. This is the regime most large-scale photometric 
and spectroscopic surveys, such as SDSS, WiggleZ, BOSS, 
GAMA, DES, and LSST. The second regime, where the sur- 
vey area is comparable to or smaller than the weighting ker- 
nel (e.g. with DEEP or HST fields) is more complex. Section 
5.3 discusses this case. 
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Figure 10. The top panel shows the configuration-space weights 
of the optimal estimator, 8 X Vi(0), in the Limber and flat sky ap- 
proximations for the illustrative cases considered in Fig. 4, again 
for the i = 6 redshift bin. The bottom panel shows this times the 
angular correlation function, LOp Si , which shows explicitly which 
angular scales the information derives. The thin solid green curve 
in each panel is the weighting scheme used in our analog of the 
Newman (2008) estimator, with r max = 10/i- 1 Mpc. All of the 
curves, aside from the Newman-analog ones, have down weighted 
nonlinear modes by the factor exp[— £ 2 /£%tJ- 



5.2 Comparison to earlier work 

Using cross-correlations to estimate redshift distributions 
has been championed recently by Newman (2008) and 
Matthews & Newman (2010). The configuration space ex- 
pression for the optimal quadratic estimator (c.f, Eq. 51) 
allows us to compare explicitly with the Newman (2008) 
method. Though the Newman (2008) method is neither op- 
timal nor unbiased, it has some similarities to our estimator 
as we shall see. 

The estimator in Newman (2008) and Matthews & New- 
man (2010) is fairly complicated, as it involves nonlinear, 
power-law fits to correlation functions over a specified range 
of scales and with specified, diagonal (i.e. ignoring bin-to-bin 
correlations in 6 and z) weights. The estimator is thus a non- 
linear functional of the measured 2-point functions. However 
since the power-law fit is used mainly to divide out trends 
and fit for an amplitude, we can write an analogous esti- 
mator to Newman (2008) that contains essentially the same 



information. Our analog-estimator becomes very similar to 
that of Newman (2008) for power-law models. 13 

Our analog of the Newman (2008) estimator is 14 



N, 



(p) 



-1 New / 



pSi 



where 



(54) 



(55) 



This estimator returns if the Limber approximation 

holds and the underlying power spectra and biases are cor- 
rectly guessed. When the sum in Eq. (54) is over configura- 
tion space pixels (as in Newman 2008), the weighting is 



New / \ 

Vi (r) 



- 1 - ' mm < r < 

1 max 

otherwise 



(56) 



where Newman (2008) chooses r m i n = and r max 
1 h~ Mpc. Fig. 10 compares the weights of the our op- 
timal estimator to that of our Newman-analog estimator. 
The thin green solid curve in the top panel is 0vf ew (6) 
and in the bottom panel this curve is 9vf ev, (8) x uj pSi . The 
thick curves are this same quantity for the optimal estima- 
tor for the same four extreme cases as considered earlier. 
The Newman-analog estimator uses similar scales to those 
selected by the optimal estimator, especially in the rare-rare 
case. 

While the weights for the optimal quadratic and 
Newman-analog estimators are superficially similar, it be- 
comes apparent that the estimators behave differently when 
examining the weights in more detail. The optimal estima- 
tor in the shot noise-limited regime has configuration-space 
weights given by the density correlation function. However, 
the Newman-analog weights are simply a constant. The 
structure of the Newman-analog estimator is also much dif- 
ferent in the signal-dominated regime. The optimal estima- 
tor has weight Vi(9) oc J £d£ S C^ 1 Jo(£8), assuming the 
Limber and diagonal estimator approximations, in contrast 
to the constant configuration space weights assumed in the 
Newman analog. 

The variance of these estimators also differ. The covari- 
ance of the minimum variance estimator is F _1 , whereas the 
covariance of the Newman-analog estimator (in the Limber 
approximation) is 



(57) 



x [A 0i (£)A 0j (£) + Aoo(£)A tt (£) , 
where the Fourier space (flat sky) Newman weights are the 



13 We must further assume that the power-law slope of the spec- 
troscopic sample equals that for the photometric sample. On large 
scales this is guaranteed, as both trace the matter power spec- 
trum. However on small scales it depends on the manner in which 
objects populate haloes. Newman (2008) and Matthews & New- 
man (2010) assume that the cross-correlation function is the ge- 
ometric mean of the auto-correlation functions. 

14 While Newman (2008) does not explicitly subtract a shot-noise 



term, we have subtracted w 



(pa) 



so that the estimator is well- 



defined in both configuration and harmonic space. 
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Hankie transform of eqn. (56): 

/ew^ = X» f Jl(^?- max /x») _ Jl(£r m in/Xi) \ ^ 
£ \ ^max ^"min / 

The rapid oscillations at higher £ damp the contribution of 
these modes. The dot-dashed curves in Fig. 9 (shown only 
for the i = 23 case) in the top and bottom panels are the 
variance of the Newman-analog estimator without any non- 
linear cutoff in I. The Newman-analog estimator performs 
substantially worse than the optimal estimator: a factor of 
3 — 10, with the factor of 10 applying to the abundant galaxy 
case (which is most similar to the cases investigated in New- 
man 2008 and Matthews & Newman 2010). 

5.3 Finite sky coverage 

Until now many of our expressions have implicitly assumed 
that the surveys cover the full sky, which is unlikely to be 
the case in practice. For surveys whose narrowest dimension 
is much larger than the scales where our estimator peaks, 
the correction for finite sky coverage is benign: we simply 
have a factor of / sky to correct the number of modes in 
our Fisher matrix (e.g. Scott et al. 1994; Jungman et al. 
1996; Tegmark 1996; Knox 1997), as we have assumed in our 
prior example calculations. The effects of finite sky coverage 
have been studied extensively in the CMB (e.g. Hivon et al. 
2002; Hansen et al. 2002; Efstathiou 2004) and large-scale 
structure literature (e.g. Feldman et al. 1994; Peacock & 
Nicholson 1991; Park et al. 1994; Tegmark et al. 1998). 

If the area of the sky covered by the calibrating, spec- 
troscopic survey is small (< a few deg. on a side) then the 
effects of the window function (the function that describes 
a survey's sky coverage) become important as our estimator 
uses separations out to ~ 1 deg (especially in the case in 
which both samples are sparse). In this case we can safely 
make the flat sky approximation in which case our harmonic- 
space description is a Fourier-space description. In this case, 
the minimum variance quadratic cross-correlation estimator 
estimator for N- p ^ takes the form 

p(k 1 )Q(k 1 ,k 2 )?,(k 2 ) (59) 

where Q is a kernel which is no longer diagonal in k-space. 

The case of a general survey window function can be 
complex, but, if the width and height of the window are 
comparable, the effects of windowing are easily understood. 
Due to the convolution with the window function, fc-modes 
which are separated by less than 2n/Q (where O is the an- 
gular extent of the window function) are almost completely 
correlated and, thus, contain largely redundant information. 
Whereas, for modes separated by much more than 2n/Q, the 
effects of the window function can be largely ignored. 

Thus the effects of finite sky coverage can be taken into 
account by replacing our sums over £ with sums over L values 
which are integer multiples of 2n/Q and defining the Cl as 
bin-averages of the C'e- A simpler approximation, valid if the 
theoretical spectra are smooth, is to simply integrate from 
2n/Q to infinity rather than zero to infinity in Eq. (52). 
If in computing the correlation function or power spectrum 
we estimate the mean density from the survey itself, then 
the power is suppressed on large scales (often known as the 
integral constraint; Peebles 1980). An approximation to this 



suppression is to multiply Ct by |1 — IU(£)| 2 where W(£) is 
the window function normalized so that W — > 1 as £ — > 0. 



6 BIAS OF APPROXIMATE ESTIMATORS 

The minimum variance quadratic estimator under the 
approximation that off-diagonal terms in the Fisher matrix 
are zero is unbiased as long as the diagonal entries are 
appropriately calculated. In addition, dropping derivative 
terms in the quadratic estimator is unbiased since each 
derivative explores separate dependences. However, there 
are a few approximations that could incur bias: the Limber 
approximation, ignoring RSDs, including nonlinear scales, 
cosmic magnification, and assuming the incorrect cosmol- 
ogy. We do not consider the latter because it should be 
reduced to the per cent-level with the coming generation 
of cosmological probes, but we consider the others. 15 If the 
bias is small, it can be easily computed by substituting the 
full ((p s)* x (p s)} that includes the ignored terms into 
the approximate estimator and evaluating both near the in- 
put N\ . Using this formalism, we address these biases here. 

Limber approximation and RSDs: 

In the Limber approximation, which has been assumed 
by most previous investigations of dN/dz estimation from 
cross-correlations, the diagonals are accurately estimated in 
the limit £A\ S> X (although, in practice this condition 
has to be just weakly satisfied). Fig. 6 suggests that most 
scales that contribute to our estimate are safely in this Lim- 
ber regime for Az ~ 0.1. This will be less true for smaller 
Az. On angular scales favored by our estimator, at which 
the matter power spectrum is decreasing with increasing k, 
the Limber approximation results in an over-prediction of 
the Cu. Hence, the our Schur-Limber estimator will result 
in an under-prediction. However, setting to zero the (piSj) 
for i 7^ j in the Limber approximation has the opposite ef- 
fect. We find that the former effect is larger such that Lim- 
ber results in an under-prediction, with a fractional error of 
-(2 - 3) x 10~ 3 for Az = 0.01 and < z < 1 for cases where 
most of the information derives from £pk~2 (i.e., where one 
of the populations is abundant) and —(0.3 — 1) x 10~ 2 for 
cases where most of the information derives from £ Pk -i. 16 
For Az = 0.1, the biases are of course significantly smaller 
than for Az — 0.01. Thus, the Limber approximation will 
likely result in a bias that is smaller than the estimator's 
variance even for applications with very large source popu- 
lations. A corollary of Limber being a good approximation 
is that extra sensitivity to the does not arise by first 

estimating them in redshift bins with smaller redshift inter- 
vals than the desired Az for Az > 0.1. (Fig. 3 suggests that 
there is some extra information for Az = 0.01 as the Limber 
estimate errors differ.) 

The fact that the Limber approximation is as successful 
as it is suggests that redshift space distortions (RSDs) will 

15 If dN/dz is being estimated as part of a program aimed at 
constraining the cosmology, e.g. with gravitational lensing, the 
cosmology and dN/dz will have to be simultaneously varied. 

16 We speculate that the surprising smallness of the biases in 
Limber results because of a near cancellation of the two competing 
effects. 
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also induce a small bias (as RSDs are negligible on scales 
at which the Limber approximation holds; Appendix B). 
However, for reasons discussed in Appendix B, including 
RSDs is difficult in our current formalism as it requires a 
basis switch from our choice of top hat redshift bins, which 
spuriously magnify the impact of RSDs. Thus, we do not 
quantify the magnitude of their small bias on the estimator. 



Nonlinear scales and the one halo term: 

Using scales that are nonlinear can bias the estimator. 
The Schur-Limber estimator for is biased by nonlin- 

ear effects that occur at the redshift of the estimate, Zi, and 
(fortunately) not by nonlinearities at other redshifts. This is 
not the case for the minimum variance quadratic estimator 
(a fact that we have ignored). In our estimates in Section 
4 and Fig. 9, we masked nonlinear wavenumbers at z% that 
met the criterion k > /cnl(zi) (defined in Eq. 34), and found 
that this operation does not have a large impact on the sen- 
sitivity, except for the densest samples that were considered. 
This result owes to the broad range in £ that contributes the 
information, which generally peaks at I < £nl (Fig. 4). We 
find that if we reduce /cnl by an additional factor of 2, which 
corresponds to a wavenumber where the nonlinear density 
power spectrum deviates from linear theory by just 10 per 
cent, the constraints are additionally degraded by a similarly 
small factor. 

As long as they are modeled properly, nonlineari- 
ties that trace the density field do not necessarily bias 
a measurement of as the galaxies still trace the 

same large-scale density fluctuations. A bias will arise 
if intra-halo correlations contribute at scales where they 
are not in the white noise regime (as we have assumed). 
Fortunately, deviations from the large-scale limit generally 
occur at wavenumbers that are larger than A;nl, especially 
if clusters and large, low-redshift groups are excluded from 
the cross-correlation analysis (see plots in Cooray & Sheth 
2002). 

Magnification bias: 

Magnification bias is the most significant of the biases 
we considered. Cosmic magnification results in additional 
off-diagonal terms in C that were zero in the Limber approx- 
imation. These terms are suppressed relative to the j — j, 
diagonal Limber term (Eq. 21) by the factor 



R 



(x) 



oT + i 



(1 + *j) Xi A Xj 
2 x 10 7 Mpc 2 



i _ *i 



(60) 



for i > j, where Oif^ is the power-law index of the cumu- 
lative source number counts in bin i above a certain flux 
threshold (see Appendix C). Eq. (60) ignores magnification- 
magnification correlations, which are smaller except perhaps 
for surveys at z > 1. 

For our simple Schur-Limber estimator, it is easy to 
compute the estimator bias, being 



N, 



(p) 



frac. bias from mag. = ^ ■jAp) R k'i + 



k, k~>i 



k, k<Ci 



(61) 

where C'u is defined in Eq. (21). Thus, this estimator results 
in an overestimate when —a'*' — 1 > 0. Evaluating this for 
our toy case of constant dN/dz from < z < 1, one finds a 
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Figure 11. The estimator bias arising from magnification for 
estimators that ignore this effect. The curves assume that the 
photometric sample consists of all galaxies with < 25.3 and 
< z < 2.5 (the bias will be smaller in lower redshift samples) 
and different spectroscopic samples. The thick blue curves are 
the full quadratic estimator for BigBOSS and an overlap area 
of 10 4 deg 2 , the black curves are dNW/dz = lOdeg- 2 covering 
< 2 < 2.5 and over 10 4 deg 2 , and the red curves are an example 
survey with jW = 23 and 40 deg 2 . Solid (dashed) curves indicate 
that the bias results in an overestimate (underestimate). The top 

(v) 

panel is the bias relative to Afi , and the bottom panel is this 
relative to the fractional error. All curves simplistically assume 
that the flux number counts of both populations has the rather 
steep power-law index of a = —2, to emphasize the effect. The 
labelled thin blue curve in the top panel is the BigBOSS case with 
just the diagonal Schur-Limber estimator. 



~ —0.05 (a + 1) per cent bias that is roughly constant with 
Zi. In addition, Eq. (61) shows that if is well below the 
peak in i, this bias can be particularly severe. 

Fig. 11 illustrates the importance of magnification bias 
for a case in which the photometric sample consists of all 
galaxies with i^ < 25.3 and different spectroscopic sam- 
ples, all covering < z < 2.5. (Lower redshift samples 
would be less biased by magnification.) For simplicity, we 
take a\ = — 2 for all populations, which emphasizes the ef- 
fect (being characteristic of the bright end of quasar counts; 
fainter quasars have slope a ~ —0.5; Bartelmann & Schnei- 
der 2001; Scranton et al. 2005, and the faint-end slope for 
galaxies is —(0.5 — 1); Bouwens et al. 2012). The thick blue 
curves represent BigBOSS and 10 4 deg 2 , the black curves a 
survey with dN^ jdz = 10deg~ 2 over 10 4 deg 2 , and the red 
curves a survey with i (s) = 23 and 40 deg 2 . Solid (dashed) 
curves indicate that the bias results in an overestimate (un- 
derestimate) . The top panel is the bias relative to , and 
the bottom panel is this relative to the fractional error. At 
z < 1.5, the bias is ~ 1 standard deviation for two of the 
cases. However, for BigBOSS (which has fractional errors of 
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10" 



the bias is 10 a over many of the redshift bins of 



interest. For all the cases, the biases are largest at z < 0.5 
and z > 1.5, redshifts at which there is a significant fall off in 
the photometric population. The fact that these curves can 
become negative contrasts with the Schur-Limber estimator, 
which would always be biased high. The thin blue curves are 
the Schur-Limber estimator for the case with BigBOSS. We 
find that the bias of the Schur-Limber estimator (Eq. 61) is 
typically larger than the bias of the full minimum variance 
quadratic estimator (that ignores magnification). 

In all cases magnification bias can be computed given 
an estimate for the a\ and removed. The main issue is 
uncertainty in the . It should be reasonably straightfor- 
ward to remove the bias at redshifts greater than the peak 
in dN^/dz (where it is most severe) as the spectroscopic 

is) 

galaxies act as the sources and their a\ is easily measured. 
However, uncertainty in oif* could be the limiting factor in 
constraints at redshifts where the photometric galax- 
ies act as the source, particularly in surveys that can place 
percent-level errors on the and that extend to high red- 
shifts. In such cases, the error will be approximately set by 
the fractional bias of A?^ p ' owing to magnification (what is 
plotted in Fig. 11) times the fractional uncertainty in oc x >. 
Knowledge of to 10 + 1| per cent precision is re- 
quired for this not to be the limiting factor for the Big- 
BOSS case considered above. Since magnification only de- 
pends on the sources' Ni and not their bi, the significant 
bias of BigBOSS also suggests that it can use magnification 
to break this degeneracy and separately estimate the b\ 
to 10 a' 21 ' + 1| per cent precision. We revisit the impact of 
magnification in Section 7, showing that is less onerous in 
the cases of (1) photo-z calibration and (2) estimating the 
redshift distribution of diffuse backgrounds. 

Analogous to magnification, intervening dust can also 
correlate background galaxies with foreground ones for sur- 
veys in the optical and bluer wavelengths (Menard et al. 
2010). At linear scales, this effect will induce correlations 
that are a biased tracer of the projected density. The mag- 
nitude of this effect with redshift could be determined with 
multi-band photometry using a population with uniform 
spectra, e.g. quasars, and this information would allow it 
to be corrected for in cross correlation studies again to the 
extent that the are known. 



7 CALIBRATING PHOTOMETRIC 
REDSHIFTS AND CLEANING 
CORRELATED ANISOTROPIES FROM 
MAPS 

Our previous results can be generalized to spectroscopically 
calibrate the dN/dz of a photometric population that is par- 
titioned by photometric redshift, an application which is rel- 
evant for large-scale clustering and weak lensing analyses on 
photometric populations. When the catastrophic failure rate 
of the photometric redshift estimate is small, then it may 
be fruitful to self-calibrate by internal cross-correlations be- 
tween different photometric redshift bins. However, if the 
catastrophic failure rate is large, there can be large degen- 
eracies in the reconstruction from self calibrations, and it 
will be more robust to calibrate photometric redshifts with 



a spectroscopic sample. In Section 7.1, we discuss the lat- 
ter, and Section 7.2 discusses the former. This section also 
addresses the more general problem of estimating the red- 
shift distribution of a photometric sample in which other 
constraints exist for the sample's redshift distribution. Fi- 
nally, in Section 7.3 we discuss how our results can be used 
to statistically clean diffuse background maps. 

7.1 Spectroscopic calibration 

Consider binning the photometric sample by some property 
that we refer to as its "photon" , and we denote the sample 
in photometric redshift bin 'm' as l pm'. One can think of m 
as, for example, indexing a probability distribution of the 
sample's redshift as estimated from photometry. The goal is 
to use cross-correlations with a spectroscopic sample to con- 
strain this probability distribution. The primary difference 
with the calculations in prior sections and this calculation 
is that the fluctuations from each photometric redshift bin 
are more likely localized in redshift than the full photometric 
sample. (We defer discussion of internal correlations between 
different photo-z bins to Section 7.2.) 

If this is the case, our approximate formulae for the 
sensitivities in different limits (Eqs. 44, 45, and 46) are al- 



tered so that ft ^ 

t(p™) ri(pm) 



(H 



tot 



where N- pra> \of m ] is the sky density [linear bias] of the 
photometric galaxies in redshift bin m that are actually at 



redshift i, N. 



LP™) 



(pm) rp(pm) 



Di fe (P™>jvf m \ and 



(T,i[ T } Pm) ?) 1/2 - These relations for ft and 



exact in the distant observer approximation. With these re- 
placements, we can recast our formulae in the rare and abun- 
dant limits for the case of photo-z calibration. 

If the spectroscopic sample is in the rare limit, the po- 
tential constraint on the population in photo-z bin m that 
is actually in redshift bin i follows from Eq. (45) and is 



ST, 



(pm) 



-*(pm) 



0.06 

b, (s) A 



10 4 



-1/2 



l + Z 



(62) 



Note that <5T^ pm ' /T^ 7 ™' equals the outlier fraction for bin 
i / m in the limit that pm primarily falls in redshift bin m 
and that the clustering is redshift independent. For pm to be 
in the dense galaxy limit (as Eq. 62 assumes) requires that 
the redshift span of the photo-z bin is sufficiently concen- 
trated that EJA^ pm) ] 2 Cii > [iV t ( p t m) ]-\ which roughly 
should hold if dN( pm 1 jdz at the full width half maximum is 
greater than [dN^/dzfj? ■ 

In the contrasting case of a dense spectroscopic and 
photometric sample, it follows from Eq. (42) that 



(pm) 



-,(pm) 



0.03 



/sky \ 
0.001 J 



-1/2 



(lo_ 



(63) 



Eqs. (62) and (63) demonstrate that cross-correlations can 
be used to constrain the fractional number (times bias) from 
pm in bin i at the part in a hundred level with 10 5 — 10 6 
spectra per unit redshift (for rare spectra) or f s ^ y — 10~ 3 
(for high spectral densities). 

Fig. 12 presents estimates for how well the redshift dis- 
tribution of a photo-z bin can be reconstructed in bins of 
size Az = 0.05 with cross- correlations for the z m = 1.45 
photo-z bin, assuming the "outlier" photo-z's that are not 
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Figure 12. Estimates for how well the redshift distribution of 
sources (times their bias) in the photo-z bin pm can be recon- 
structed with cross-correlations, assuming redshift bins of size 
Az = 0.05. Our calculations assume that much of pm resides 
in the z m = 1.45 bin, with "outlier" galaxies distributed uni- 
formly in the range < z < 2.5, and that the number density 
at z m is that of a survey complete to i( p > = 25.3 unless spec- 
ified otherwise. The solid curves assume half of the galaxies in 
this photo-z bin reside outside of it, uniformly distributed so that 
jy(pm) ijy(pm) _ j_q— a £ or j m rp ne dashed curves arc the 

same but for an outlier fraction of N^ pm ^ /A^™' = 10 -3 (so that 
most galaxies reside at z m ). The top panel shows the constraints 
from different spectroscopic samples with the specified constant 
dN^ /dz over < z < 2.5 and with / s jt y adjusted so that there 
are 10 5 total spectra. The middle panel shows three different spec- 
troscopic samples that could be obtained for the same total tele- 
scope time (with the same specifications as in the top panel in 
Fig. 9). The bottom panel is for a spectroscopic sample with the 
specifications of BigBOSS and the specified limiting photometric 
magnitudes. All curves truncate the summation over I at ^nl- 



actually at the redshift z m are distributed uniformly in the 
range < z < 2.5. The solid curves assume half of the 
galaxies in this photo-z bin reside outside of it, uniformly 
distributed over i bins so that N^ pm) /N^ } = 10" 2 for 
i / m. The dashed curves are the same but for an outlier 
fraction of N^ pm) /A^ t m) = !0~ 3 so that only 5 per cent of 
galaxies reside outside the photo-z bin z m . The top panel 
shows the constraints from different spectroscopic samples 



with the specified dN^ /dz, which is held constant over 
< z < 2.5 and for fixed total number of spectra. This 
panel shows that Eq. (62) is in qualitative agreement with 
these estimates, noting that here A/"'"' — 4 x 10 4 .(We dis- 
cuss the dip at z m — 1.45 below.) Especially for the two 
lower number densities, the constraint depends weakly on 
the density of spectra as Eq. (62) predicts. 

The middle panel in Fig. 12 is for the photometric 
sample with the specifications of the LSST gold sample 
(j( p ) = 25.3) and for different spectroscopic samples that 
could be obtained for the same total telescope time (with 
the same specifications as in the top panel in Fig. 9). In this 
case, both the photometric and spectroscopic galaxies are 
at least marginally in the dense limit such that Eq. (63) ap- 

1/2 

plies, and the sensitivity scales roughly as f B ^. y . In the three 
cases plotted, / S ky equals 2.5 x 10~ 4 , 10~ 3 , and 4 x 10~ 2 . 
The predictions in this panel depend weakly on the outlier 
fraction (compare the solid and dashed curves, which in two 
of the cases lie on top of each other). The sensitivity of fol- 
lowup to = 21 also falls off substantially at the highest 
redshift, which reflects that the spectroscopic galaxies are 
entering the rare regime. 

The bottom panel shows the case of a spectroscopic 
sample with the specifications of BigBOSS and the speci- 
fied limiting photometric magnitudes, and that the surveys' 
overlap is 10 4 deg 2 . The BigBOSS sample is on the border- 
line of the rare limit (especially at the lowest and highest 
z) such that this panel is most difficult to relate to our pre- 
dictions. The rare-abundant limit given by Eq. (62) appears 
to apply since BigBOSS has Afj; 3) ~ 10 7 at z ~ 1 for the 
case of i( pm ' = 25 (Fig. 1). However, it does not hold for the 
i w = 23 case, as i (pm) = 23 is also on the borderline of be- 
ing in the rare limit with dN^ /dz = 5,000 deg -2 at z m , 
which results in a loss in sensitivity compared to i' pm ' = 25. 

Auto-correlations (which were dropped in the deriva- 
tions that led to Eqs. 62 and 63) add additional information. 
We find that auto correlation estimates add little extra in- 
formation for redshift bins that contain only a small fraction 
of pm galaxies. However, for the redshifts that contain the 
bulk of pm, they can improve the constraint by an order 
of magnitude. This can be seen by focusing in on the dip 
at Zm = 1.45 in Fig. 12, which corresponds to the redshift 
that contains half or more of the galaxies. Eqs. (62) and 
(63) do not predict a dip. Especially with a rare spectro- 
scopic sample as investigated in the top panel (where the 
cross-correlations can be quite noisy) and a low outlier frac- 
tion, much of the constraint on the number at z m owes to 
the large value of pin 2 , which indicates many galaxies are 
concentrated in a narrow range in redshift. 

Eqs. (62) and (63) [and Fig. 12] show that a 0.0015 error 
on the fractional number on 'all outlying peaks' in the true 
redshift distribution of pm would be difficult to achieve with 
spectroscopic cross-correlations (even ignoring that the 6^ p ' 
also need to be constrained to O(10 -3 f? 1 ), where f c is the 
contamination fraction). Such sensitivity is required for un- 
certainty in the redshift distribution of the lenses to not to 
be the limiting factor for the next generation of photomet- 
ric weak lensing surveys (Bernstein & Huterer 2010). These 
issues aside, the case of BigBOSS cross-correlations with 
a photometric sample complete to = 25 over 10 4 deg 2 
(green curves in bottom panel of Fig. 12) achieves an error 
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that is comparable to the Bernstein & Huterer (2010) re- 
quirement. To quantitatively answer the question of whether 
a BigBOSS-like survey is sufficient for futuristic weak lens- 
ing surveys requires an analysis of the bias on cosmological 
parameters induced by the pattern of uncertainties we find. 

Thus far we have ignored prior information on the red- 
shift distribution of the photo- 2 subsample pm. Often it is 
the case that we have prior information on the distribution 
of jV^ pm ', e.g. from the photometric redshift PDF per galaxy 
(Lima et al. 2008; Freeman et al. 2009; Sheth & Rossi 2010). 
In this case our formalism has only minor modifications. 
Appendix A2 reviews how the quadratic estimator formal- 
ism generalizes to include prior information. For a Gaussian 
prior on the Ni (dropping pin superscripts for simplicity), 
the estimator with a prior becomes 



[A>,]ia St + [F + F P ]r/j £ 

\ i, m 



(p s ) Qj 



P 



+ [F P ] jk (iV P , fe - [A> fc ]i ast ) 



(64) 



where Fp and Np,i are respectively the inverse covariance 
matrix and mean of the prior. The prior pulls the estimated 
quantity towards iVp,fc, and this pull dominates if the prior 
is more peaked than the likelihood of the data. 

The final subtlety we address with regard to photo-z cal- 
ibration is cosmic magnification. Section 6 showed that cos- 
mic magnification can be a significant bias if unaccounted for 
redshift estimation of the entire photometric sample. Magni- 
fication is less onerous for photo-z calibration to the extent 
that the redshifts of the photo-z samples are constrained. 
Appendix C2 addresses how magnification can be accounted 
for in this case. 



7.2 Self calibration of photometric sample 

Self-calibration of redshifts by cross correlating different 
photo-z bins within a photometric sample has the potential 
to achieve a tighter constraint on the Nj pm ' than calibration 
using correlations with spectroscopically identified galaxies, 
since spectroscopic samples are likely to either be sparser in 
number or distributed over narrower fields than photomet- 
ric ones. Self-calibration of a photometric survey with cross- 
correlations has been investigated in several studies (Huterer 
et al. 2006; Schneider et al. 2006; Benjamin et al. 2010). Here 
we show that the maximum sensitivity to dN^ pm ^ /dz that 
can be achieved with photometric self-calibrations is strik- 
ingly similar to the previously considered case of abundant 
spectroscopic and photometric samples. 

For self-calibration to be successful, the redshift distri- 
bution of the photometric sample pm needs to be much bet- 
ter known than in the case of calibration with spectroscopic 
cross-correlations. This is because the redshift of pn for all n 
is the only knowledge one has to measure the redshift of pm: 
If pn is not centered around a single redshift, it is unclear 
how finite (pThpn) translates into the redshift distribution of 
sample pm. To avoid this difficulty, we assume that most of 
sample pm falls into redshift bin z m . This assumption is the 
best case scenario, and will allow us to put a lower bound on 



the constraint from self calibrations. 17 Thus, the covariance 
matrix of the the different photo-z bins is 

B mn = < P (m y n) ) = e T< f m) T i vn) en + w<r pn) sfi , 

« E T^ m) T^ n) C u +w<£ mpn \ (65) 



and we have assumed the same discretization in redshift 
to specify both the photometric and actual redshift bins. 
In the second line, the sum is evaluated at only one value 
of i if m = n (i.e. the auto-correlation). The approxi- 
mate equality in the last line follows from assuming dj 
is diagonal (as holds in the Limber approximation), that 



rpipm) _ j-, Apm) A r(pm) 
J- m — l^mUm iv m 



(pm) 



and from keeping 

terms that are 0(T^ pm) /T^ m) ) or larger. This is the limit 
in which the fraction of catastrophic photo-z's is small and 
where the covariance matrix B mn is diagonally dominated. 



In this limit, and to lowest order in a m ,i 
the Fisher matrix with respect to the T' pm ' is 



n(pm) ^rp(pm) 



„(pm) (pn) 
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(pm) 
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(66) 



where B„„ « [T n P "^] 2 Cnn + Wnn Pn \ and the matrix is zero 
between other combinations of parameters. The quadratic 
estimator for T n p in this limit can also easily be writ- 
ten as it only involves correlations between the photometric 
samples m and n. Thus, in the diagonally dominated limit, 
the parameter T n pm ^ only correlates with ri P "', and there 
is a perfect degeneracy that must be broken by adding a 
prior (often catastrophic errors occur in one redshift direc- 
tion) or going to higher order terms that are suppressed by 
another factor of a m ,i- In the case of the prior that con- 
straints ri P °' to be zero, many of our previous results hold 
as Eq. (66) is the same as Eq. (35) and its other incarnation 
in Eq. (42) with the replacement (3{z) = 1 and a slightly 
different number dependence. (In fact, we do not need the 
additional approximation of S = 1, as was made there.) 
Thus, if T n pn) > 10 4 b~ 2 Az deg~ 2 , so that the abundant 
limit holds, 
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(67) 



Photometric self-calibration over a significant fraction of 
the sky is capable of part in 10 3 accuracy required by the 
next generation of weak lensing surveys (e.g., Bernstein & 
Huterer 2010), but with the same caveat as noted in the 
previous subsection that we have not calculated the bias on 
cosmological parameters. In addition, this error only applies 
to the case of a single catastrophic error direction. If the lat- 
ter does not hold, the constraint is likely to be weakened by 
the factor ^/a m ,i- This conclusion would be improved some- 
what if neighboring photo-z bins had significant overlap in 
redshift so that some of the ^/a m ,i were not so small. 

More generally, the full covariance matrix of the photo- 
z bins, Bmn, (plus overlapping spectroscopic populations) 



17 This assumption requires a highly artificial top hat photo-z 
distribution at z m for consistency. However, we expect that our 



result is more general than this choice. 
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can be used as the covariance matrix in the minimum vari- 
ance quadratic estimator. This self-calibration estimator is 
likely to be more sensitive than the algorithm discussed in 
Benjamin et al. (2010), the only self-calibration method that 
we are aware of, as that algorithm uses linear combinations 
of the A a fj that encapsulate a subset of the full covariance 
and does not weight scales optimally. 

7.3 Cleaning correlated anisotropies from a map 

Our estimator is optimal for statistically estimating the 
level of (and, hence, cleaning) correlated anisotropies 
from angular cross-correlations between diffuse back- 
ground/foreground maps and spectroscopic galaxies. The 
fractional errors we quote on number are equivalent to 
the error with which anisotropies can be statistically re- 
moved. Thus, the survey optimizations for this application 
are equivalent to those discussed for Af estimates. Our pre- 
vious calculations suggest that correlating anisotropies can 
be cleaned statistically to the 1 per cent level. For wide field 
observations of diffuse redshifted 21cm emission, this factor 
of 100 could be helpful if extragalactic sources are found to 
be a limiting factor. 18 For CMB analyses, cross-correlations 
could also be interesting for studying the redshift distribu- 
tion and cleaning of foregrounds. For example, it could bet- 
ter enable the separation of the cosmic infrared background 
(CIB) from CMB anisotropies generated at higher redshift. 
(CIB contamination is currently the limiting factor in mea- 
surements of kinetic Sunyaev-Zeldovich effect, which conve- 
niently does not correlate with the St; Reichardt et al. 2012). 
Kashlinsky et al. (2007) investigated correlations on ~ 10' 
scales between diffuse anisotropies in Spitzer and HST deep 
fields. Our results suggest the sensitivity to the clustering 
component would be increased with wider fields (perhaps 
using shallower ground based observations rather than HST, 
since we found that the extremely high number density in 
the HST fields is not useful). 

For diffuse anisotropies, gravitational lensing enters 
at second order because lensing preserves surface bright- 
ness. Thus, at large scales its impact on correlating the 
anisotropies in a map with the spectroscopic sample is small. 
If the "spectroscopic" sample is measured at sufficiently 
high redshifts that the magnification-magnification term be- 
comes important, only then can magnification result in a lin- 
ear order diffuse foreground-spectroscopic population cross- 
correlation signal. Magnification also has the effect of cor- 
relating the si, which can bias the estimate. However, both 
magnification effects should be correctable as the af*' can 
be measured. 



8 MOCK SURVEYS 

We are interested in understanding the robustness with 
which the proposed estimator converges to the input . 
To investigate its convergence, mock surveys are generated 

18 It is thought that emission from the Galaxy is more problem- 
atic for 21cm measurements, although the contribution of extra- 
galactic sources to the foregrounds increases with I (McQuinn 
et al. 2006). 




AN/a N 

Figure 13. Demonstration of estimator convergence, showing the 
distribution of estimated value of the estimator in units of the 
Fisher error for 1, 000 mocks. The thick solid curve is the expected 
distribution of estimates. For each mock, we start off with initial 

(p) 

values for the that are each an order of magnitude smaller 

than their actual value. The top panel shows the case of a 10 X 
10 deg 2 field with the specified populations and 10 bins spanning 
< z < 1 (resulting in ~ 10 per cent errors). The bottom is a 
30 X 30 deg 2 field with a photometric sample complete to = 
25.3 and 50 bins spanning < z < 2.5 (resulting in ~ 1 per 
cent errors). We find that the estimator robustly converges to its 
minimum, even when it starts far from it, and that in both cases 
there are zero outliers at > 5 a in the 1,000 mocks. 

by decomposing the covariance matrix A into its eigenvec- 
tors e a and eigenvalues X a for a € [0, A^m]- Then, a real- 
ization of the galaxy field that at multipole £ that has this 
covariance matrix is given by 

JV bi „ 

gp[t,m) = r a \ a (£) 1/2 [e a (£)] B , (68) 

where r a is a Gaussian deviate with unit variance. Here, gi 
corresponds to the overdensity in redshift bin i of the spec- 
troscopic survey, and go is the overdensity in the photometric 
sample. Our mocks assume that we are operating in a small 
enough patch such that there is a one-to-one mapping be- 
tween wavevectors and spherical harmonics. In addition, our 
mocks assume linear theory and the Limber approximation. 
These approximations should not impact the conclusions per 
our previous results. 19 



19 These mocks have one significant advantage over a real survey: 
they are periodic. Hence, we do not have to worry about the 
survey window functions, and different modes on the lattice are 
truly independent. We discussed how to deal with these real- world 
complications in Section 5.3. 
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Figure 14. Walk of the estimated N as a function of itera- 
tion number in the middle redshift bin for the same two cross- 
correlation examples as described in Fig. 13 and the text. The 
solid curves are the full minimum variance estimator, and the 
dashed curves are the Schur-Limber estimator (which converges 
more quickly). The curves terminate after the last iteration 
changed the estimated AT- by less than a part in 10 5 when av- 
eraged over all i. The initial guesses for the N^ J are taken to be 
an order of magnitude too small. The asymptotic value of each 

*t(p) r (p) 

JVj shown in this figure is within 2 it of the input JV- . 

We generate 1,000 mocks for two contrasting cases to 
illustrate the estimator's performance: 

• 10 x 10 deg 2 field with dN (s) /dz = 10 3 deg -2 , 
dN {p) /dz = 10 4 deg" 2 , and 10 redshift bins spanning < 
z < 1 and each with 1, 000 2 pixels, which results in ~ 10 per 
cent errors on the N-* , 

• 30 x 30 deg 2 field with diV (s) /dz = 10 4 deg" 2 and pho- 
tometry up to i' p ' = 25.3, spanning < z < 2.5 with 50 
bins and 300 2 pixels, which results in ~ 1 per cent errors on 

The resolution of each mock is sufficient to resolve the scales 
that contain the bulk of the information (Section 5.3). Re- 
solving these scales forces these examples to use smaller ar- 
eas and higher number densities than many of the previous 
examples to achieve sensitivity to the cross-correlation sig- 
nal at modest computational expense. 

Next, we apply the estimator to the harmonic space re- 
alization of these mocks. (It would be equivalent to apply 
our estimator in real space using the results of Section 5.) 
Fig. 13 demonstrates that the minimum variance quadratic 
estimator converges to the expected Gaussian distribution 
of errors. This holds even though we started with initial es- 
timates for the that are an order of magnitude smaller 
than the true values used in the mocks. There are no out- 
liers from the 5 a regions plotted in this figure for the cases 



shown. Thus, our estimator does not tend to find local ex- 
trema. We find that only when the Fisher errors become 
0(1) does the estimator no longer converge properly in all 
cases. However, the Schur-Limber estimator in all cases we 
investigated successfully converged to the expected distribu- 
tion of estimates. This result is not surprising as the Schur- 
Limber estimator always minimizes ^2 e m Vi(Aoi —pst), with 

the Vi being weak functions of the other iV- . Thus, it is ad- 
visable to first use the Schur-Limber estimator in cases in 
which the N ( z p) are poorly constrained (and often in this 
limit the Schur-Limber estimator will in fact be optimal). 
Fig. 14 shows the walk of the JV„ ,» estimate as a 

° -"bin/ 2 

function of iteration number for the middle redshift bin 
in the two cross-correlation cases. The solid curves are the 
full minimum variance estimator, and the dashed curves are 
the Schur-Limber estimator (which converges more quickly) . 
The curves terminate when the next successive iteration 
changes the estimated by less than a part in 10 6 when 
averaged over all i. The Schur-Limber estimator converges 
rapidly in both examples (after 3 — 4 iterations). This similar 
convergence rate is despite the two cross-correlation cases 
being considerably different in terms of their sensitivity, 
their dN^ /dz, and their iVbin- For the minimum variance 
quadratic estimator, convergence requires additional steps - 
as many as 20 iterations for the case in the bottom panel. 



9 BREAKING THE BIAS - NUMBER 
DEGENERACY 

Much of our discussion has ignored that cross-correlations 
do not constrain number alone but instead bias times num- 
ber. 20 For many applications, bias times number is in fact 
the quantity of interest, including attempts to measure 3D 
correlations with angular correlations or attempts to sub- 
tract correlated anisotropies from a map of diffuse back- 
grounds. However, knowing the bias is particularly impor- 
tant to the application of calibrating the lens redshifts for 
weak lensing. RSDs as well as lensing magnification formally 
provide terms that break the bias-number degeneracy. How- 
ever, we argued that breaking this degeneracy is unlikely 
with RSDs. Cosmic magnification is more promising: We 
argued that surveys capable of percent-level N^- 1 determi- 
nations may be able to constrain the bias to 10 per cent. 

Other possibilities for breaking this degeneracy require 
using additional scales or constraints not included in our 
earlier estimates. Such methods to break this degeneracy 
include modeling of the one-halo term in (ps/); abundance 
matching or other modeling methods to map galaxy number 
to bias (e.g. Conroy et al. 2006, as of' is a weak function 
of mass for abundant halos) ; galaxy-galaxy lensing with the 
photometric galaxies as both sources and lenses (using the 
b\ p) Nj; p) from cross-correlation measurements - the quantity 

20 The bias often can be parametrized as a smoothly and slowly 
varying function with redshift. An exception is samples with hard 
color cuts, where the underlying galaxy population, and hence the 
large-scale bias, can change relatively quickly with z at points 
where spectral features transition in and out of filters. In such 
(p) (p) 

cases, knowledge of b\ ' ff. ' is more difficult to translate into 

(p) 

knowledge about JV. . 
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needed for the lenses - to constrain dN^ /dz of the sources); 
and measurements of the 2 nd order bias, either in the two 
point function or higher order statistics. While several of 
these avenues appear promising, we shall not pursue them 
here. 



10 CONCLUSIONS 

Determining the redshift distribution of a particular popu- 
lation of astronomical objects is often quite difficult. How- 
ever, since most cosmological objects are clustered (i.e., they 
trace the same matter field on large scales), objects that are 
close together on the sky are also likely to be close together 
in redshift. Thus, the redshift distribution of a population 
of objects can be determined by cross-correlating it in an- 
gle with a population whose redshift distribution is better 
known. This paper presented a new, optimal estimator for 
the redshift distribution of a given population in terms of 
cross-correlations. We found that this estimator (1) is quite 
intuitive in a number of limits, (2) is straightforward to ap- 
ply to observations, (3) robustly finds the posterior maxi- 
mum, and (4) conveniently selects angular scales at which 
the fluctuations are well approximated as independent be- 
tween redshift bins and at which linear theory applies. In 
addition, we provided analytic formulae that can be used 
to quickly estimate the sensitivity of cross-correlations be- 
tween overlapping surveys to b dN/ dz - the linear bias times 
angular number density per redshift. We compared our es- 
timator to others suggested in the literature, showing that 
it produces considerably smaller errors than the familiar es- 
timator of Newman (2008). 

The optimal estimator's fractional error on the num- 
ber of objects (times their bias) in a redshift bin scales as 
\/l0 2 A^ in /A/'( s ' if the spectroscopic sample has a mean an- 
gular density of less than a few thousand and the unknown 
sample has a mean density larger than this value. Here, 7V' S ' 
is the total number of spectra per unit redshift, and N^ in is 
the number of redshift bins spanned by the bulk of the un- 
known population. 1 Thus, it is not necessarily better to 
use a narrow, deep spectroscopic survey covering tens of de- 
grees than a wide, shallow one. Once the spectroscopic and 
unknown populations have dN/dz 3> 10 4 b~ 2 deg -2 , the sen- 
sitivity scales simply with the fraction of sky covered (again 
with an intuitive formula) and no longer depends on just 
the total number of spectra. We found that upcoming spec- 
troscopic surveys that aim for millions of spectra can po- 
tentially achieve percent-level constraints on the b dN/ dz of 
an unconstrained population. Furthermore, we showed that 
our estimates for the constraints on b dN/dz also apply to 
spectroscopically calibrating samples binned by their pho- 
tometric redshift, and we also commented on the sensitivity 
of photometric self-calibration. We found that cross corre- 
lations with upcoming large spectroscopic data sets may be 

21 This formula is analogous to the sensitivity of direct spectro- 
scopic followup to dN/dz, where the error is the square root of 
the number of spectra in a redshift bin. It indicates that cross cor- 
relations have an order of magnitude larger error at fixed number 
of spectra. However, cross correlations have the significant ad- 
vantage of not requiring the spectra to be of the same objects for 
which the redshift distribution is desired. 



able to achieve the stringent source redshift calibration re- 
quirements of future weak lensing surveys, but further mod- 
eling is required to quantify this. 

We investigated a number of approximations and how 
they bias the estimator. In the Limber approximation - 
which we found to be excellent - the covariance matrix for 
this problem can be analytically inverted, allowing simple 
expressions for the estimator. We showed that the nearly 
optimal, Limber-approximation estimator can be expressed 
as an iteration of 

N = [NiUt + Vi (p at - (p ?i» / ( 69 ) 

where the Vi are weights comprised of intuitive combina- 
tions of the covariance matrix (Eq. 53) and p si is the cross- 
correlation between the unknown sample and the spectro- 
scopic sample in bin z%. The summations are either evalu- 
ated over bins in angular separation or spherical harmonic 
indices depending on whether psi is measured in configu- 
ration or harmonic space. Furthermore, we found that the 
bias from assuming the Limber approximation was minute 
and also argued that the same holds for redshift space dis- 
tortions. We found that cosmic magnification can be a sig- 
nificant source of estimator bias, becoming important once 
surveys achieve < 10 per cent statistical errors (especially if 
the surveys extend to z > 2 or if dN/dz of the unknown sam- 
ple falls off quickly). We discussed strategies for correcting 
this bias. This bias is easy to correct for the case in which 
the unknown population has photo- z estimates or in which 
the magnification bias comes from the unknown population 
acting as the lens (which is the scenario that leads to the 
largest bias). 

The techniques developed in this paper can be applied 
to a wide range of existing and upcoming surveys from 
DES, GAMA and WISE, to LSST, Euclid and the SKA. 
We intend to apply this estimator to observational data in 
a future paper. 
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APPENDIX A: ESTIMATOR DETAILS 

This appendix gives two generalizations o! the minimum 
variance quadratic estimator (Appendix Al), then shows 
how a prior would impact the estimator (Appendix A2), 
and finally considers how the estimator and variance change 
with different basis choices to represent dN^ jdz (Appendix 
A3). 



Al Full Estimator 

Here we write two more complete expressions for the esti- 
mator than were given in the text. 

First, the estimator given by Eq. (16) is biased by dil- 
ferent cosmic realizations except in the limit in which a large 
number o! modes are used with comparable weight. The lull, 
unbiased estimator replaces Eq. (16) with (Bond et al. 1998, 



Ft? = ^-+E Tr 



{(§)" S) - A ) 



(Al) 



This expression shows that the estimator is biased by using 
Fij rather than i* 1 /" 11 at the level o! N e 1 '" 2 , where Ne is the 
number o! modes that contribute. There are Ne — £^_ 2 ~ 
10 6 / s ky total modes that generally contribute to the esti- 
mator (at least when one sample is abundant). Thus, this 
error will impact the estimator at the 10~ 3 / sk y 2 level. This 
additional sample variance noise should typically be below 
the statistical error. We saw no evidence for this bias in the 
estimates from mock surveys in Section 8. 

Secondly, we dropped terms that came from the depen- 
dence o! Aoo on the parameter being varied in the Limber 
approximation estimator presented in the text (eqn. 31). The 
lull estimator in the Limber regime is 



A; 



(f>) 



(p)i 



l,m 



S[A 0i ],, 
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The auto correlation terms that were (as a result o! our 
approximation) omitted in Eq. (31) become important when 
Si ^oi/I^oo An] ~ 1. We found that their effect was most 
evident when considering photo- z calibration. 



A2 Impact of Prior 

The estimator given in Eq. (16) follows from using the multi- 
dimensional Newton's method to find the zeros o! the deriva- 
tive o! the log o! the data Likelihood frmction, log C (Bond 
et al. 1998) : 22 

Ni = [A;] i as t - aiog£]„)^[log£],,-, (A3) 

where [log£]„ is the Hessian o! log£, which upon ensemble 
average is the negative o! the Fisher matrix. For a Gaussian 
Likelihood with covariance matrix C and data vector A, 
[log£],i = A T C- 1 C, i C- 1 A/2. 

With this derivation in mind, it is straightforward to 
generalize Eq. A3 to include a prior: 

Ni = [A>,]i ast -([log£]„ + [log/Ip],,)- 1 ([log/:],, + [log£ P ],,) , 

(A4) 

where Cp is the prior likelihood frmction. The case o! a Gaus- 
sian prior on the Ni is given by Eq. (64). 

As an application o! the above, let us consider the case 
o! our A^ p ' estimator in which the are imperfectly known 
and instead are constrained by prior information. Remember 
that since the N ( z p) are estimated from cross- correlations, 
they are completely degenerate with and can only be 



22 Newton's method is applied to the log o! the likelihood rather 
than the likelihood itself because Newton's method provides exact 
estimates for the extrema of a quadratic function. 
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separated with a prior from the auto-correlation measure- 
ments. In this case, the Fisher matrix of the parameters 
and of plus a prior on of yields the new error matrix: 



[N, 



(p)l 



[7V (P)]2 

N WM 



N M b (s) 

Wl2 







(A5) 

where <Jbs is the standard deviation of the Gaussian prior on 
centered on [&^] pr ior- Our previous results correspond 
to (jf, s — > 0. (We are ignoring redshift-bin correlations in 
the prior for simplicity, but such correlations can be easily 
incorporated.) The fractional variance on a measurement of 
7V"f p) is thus 
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(A6) 

Therefore, the fractional variance in the estimated bf" 1 is the 
limiting factor when it is larger than the fractional variance 



in the estimate of for the case that b\ 3 ^ 
The estimator in this limit is 



is held fixed. 



N 



(p) 



Aqo An 



{psi - A Qi } 



+ N„ 



(p) 



(A7) 



with the complementary estimator for the bias being triv- 
ially = [&n priol - 

For the case of SDSS or BOSS quasars (where A/" (s) ~ 
10 5 ), the variance in the measured bias is at s ~ 0.1 (Ross 
et al. 2009; White et al. 2012), which is comparable to the 
redshift error expected from cross-correlations (Fig. 7). How- 
ever, for rare samples with fewer spectra than SDSS quasars, 
the uncertainty in at 3 will dominate the error in the N^ p) . 



1 1000 deg 2 , i (p > =23, dN (s) /;iz=10 deg~ 
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Figure Al. Improvement in constraints from a constrained 
parametrization of dN^ v > /dz rather than the case considered in 
the text in which the Nr* are free. Shown are surveys with the 
parameters = 23, dN^ fdz = 10 deg -2 , and Az = 0.05 over 
1,000 deg 2 and < z < 2.5. The fractional errors for the uncon- 
strained case - the case investigated in the body of this paper - are 
given by the green dashed curve, and the case where b^dN^/dz 
is constrained by the functional form No (z/zo) a exp[— (z/zq)^], 
marginalizing over the parameters specified in the key, is given 
by the dot-dashed blue and dotted red curves. This constraining 
functional is evaluated at the fiducial parameters given by Eq. (3) 
for these two cases. The black solid curve shows dN^ /dz, arbi- 
trarily normalized. 



A3 Estimator and constraints in other bases 



We have chosen a top hat basis set for convenience, which 
also leads to an estimator that converges robustly to the 
peak. Other choices are clearly possible, and they may be 
preferred in some situations. For example, instead of A r | p ' 
we could estimate the parameters of a particular functional 
form. Or we could expand dN^ jdz as a sum of overlapping 
Gaussians or (orthogonal) polynomials times basis functions 
(e.g. a power law times an exponential). While the quadratic 
estimator formalism is completely general, it is not trivial 
to recast the estimator in terms of an arbitrary basis set 
as A needs to be recast in terms of the new parameter set. 
In many cases, this is not analytically expressible (with an 
exception being the linear case we discuss below). However, 
is is trivial to translate our results for the error on a param- 
eter into another basis set. The new Fisher matrix is given 
by the chain rule: 



F' 



(A8) 



where W is the Jacobian matrix between the Nf^ and the 
new parameter set Xi. We showed that the Fisher matrix is 
often well approximated as diagonal, such as in the Schur- 
Limber limit. In this case 



F'. 



E 



dN ( k p) 



dN, 



(p) 



ps 

h=l kk 



d\i d\j 



(A9) 



(p) 

Once the are estimated with our technique, they can 

be combined to estimate the A; with error given by F'. 

Fig. Al shows an example using Eq. A8 in which we 
changed basis to one in which dN^/dz is constrained to 
have the smooth functional form specified in the key (a gen- 
eralization of our Eq. (3) for P(z, i)). This figure investigates 
the case of a photometric population with — 23 and with 
a low density of spectroscopic objects given by dN^ jdz = 
10 deg -2 , overlapping over a sky area of 1,000 deg 2 (al- 
though, the total number of spectra, here 10 4 , is the essen- 
tial quantity) . It shows that the constraints are substantially 
improved even if a fairly general functional form is assumed 
(varying 2 parameters for the dotted curves and 4 for the 
dot dashed). One advantage of parametrizing dN' p ' /dz with 
a smooth functional form is that the constraints do not de- 
pend on the choice of Az. 

Finally, we note that the formalism this paper devel- 
oped for estimating the can be trivially recast for mod- 
els in which one instead aims to constrain some set of basis 
functions <j>i for which dN^ p '/dz = '^2 i Ci(f>i(z), where c; are 
a set of coefficients. In this case, the primarily difference is 
that for the ae (k, Zi) that went into calculating C(£), the in- 
dex i no longer indexes the redshift bin but rather the basis 
function. 
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APPENDIX B: EXTENDED LIMBER 
APPROXIMATION 

The Limber approximation is most applicable on small angu- 
lar scales, where we may approximate the sky as flat and the 
spherical harmonic transform as a Fourier transform (e.g. 
White et al. 1999; Papai & Szapudi 2008). With these ap- 
proximations, the angular correlation function can be writ- 
ten as 



w(9) 



dxi dX2 W{xi)W( X 2) 



l (03 P ( fc ) e!k ' (X " X2) ' 



dZ e l 



- ! 



K± dK± 
2ir 



P(k x ,fc|, =0) 



J dx W 2 (x)Mk±xO), 



(Bl) 



(B2) 



(B3) 



where in the second line we have changed variables from 
Xi to center-of-mass and relative coordinates, x — (xi + 
X2)/2 and Z = xi — Xz, and assumed W is so broad that 
W(x±Z/2) w W(x) (which is not always the case for the W 
considered in the text). Writing £ — k±x and using Jo(£9) ~ 
Pe(cos9) for 9 <JC 1 and I 1, the angular power spectrum, 
Ce, is thus 



I 



dx 



X 2 



P(k±=l/ X ,k« =0). 



(B4) 



The Limber approximation further results in correlations 
between non-overlapping redshift slices being zero. 

One can compare the Limber approximation to the an- 
alytic solution for certain cases to see when and how well 
these approximations work. Let us assume W(x) is a top- 
hat in x m slices of width Ax (as in the main body of this 
paper). Then, the cross-spectrum is 



Cij — k ± J 



dk 



2tt 



Ji„ ifc || (Xi 



" Xj 'sinc 



fc||Ax 



P(fc±,fc|| 



(B5) 

Using the method of steepest descents (or approximating the 
power spectrum as a power-law and using the asymptotic 
behavior of the resulting Bessel functions), it can be shown 
that for k±\\i — Xj \ » 1 

ki 



Ci- 



1 ^ij 



A X '- 



-P(k 



(B6) 



We can make further progress by assuming that P(k) is a 
power-law. In particular, if P(k) is a power-law with index 
—2, roughly the index on galaxy scales in our Universe, the 
integral in Eq. B5 has simple poles that make the evaluation 
trivial: 



£ Cij 



£ C?f yl 



i P ffc±Ax + exp[-fc ± Ax] - 1 i=j; 
1 cosh[fe±Ax] — 1 i ^ j- 



(B7) 

Note that when i = j and k±Ax > 1 we recover the Limber 
result £ 2 Cu ~ (k 2 _/Ax)P(k±). In addition, at k ± A X = 2, 
Eq. B7 undershoots Limber by 40 per cent with this percent- 
age decreasing roughly linearly with increasing k±Ax- The 



errors from Limber will be smaller when P(k) has a flatter 
power-law, as is the case at k±Ax ~ 1 for the Ax consid- 
ered in the text. That the Limber approximation works so 
well once k±Ax moderately exceeds unity helps explain why 
in the text we find it to be such a good approximation for 
our problem. Similarly, for P(k) equal to a constant Eq. B5 
can also be evaluated, yielding dj = C^ ymp fc x Ax 6%. This 
is exactly the Limber result, which is not surprising as the 
constant term is what is maintained in the Limber approxi- 
mation. 

Next, consider the impact of redshift-space distortions 
(RSDs) in the Limber approximation, which have been ne- 
glected in all of our prior discussion. RSDs could be interest- 
ing for our purposes because they break the b^-N^ degen- 
eracy. On linear scales the lowest-order correction owing to 
RSDs is to multiply the power spectrum by 1 + 2/3-t /i 2 , where 
/i = kn/k and /3i ~ Oj; 6 /!)' 1 ', with the redefinition of x and 
k to be the analogous redshift-space quantities (Kaiser 1987; 
Hamilton 1992). In the Limber approximation, |fe||| < Ax _1 
and so we expect 1 and the correction to be small. 

However, how quickly this falls off depends on W(x)- ln the 
case of our top hat window function and with the replace- 
ment P(k±, fcii) — > P(k±)(l + 2 Pi /i 2 ) - which is analogous 
to the Limber approximation -, Eq. B5 can be integrated 
analytically yielding 



£ 2 C %% = -^P(fcx) 
AX 



1 + 



2A \ 



k±A X J 



(B8) 



and with the off-diagonals being zero. Thus, the RSD cor- 
rection falls off slowly as (k±Ax)~ in the case of top hat 
W. A curiosity is that if we had approximated /i as k\\/k±, 
the integral would have diverged. Thus, in the case of a top 
hat W the RSD term arises from modes with n ~ 1. 

However, smoother W(x) result in RSDs having a 
weaker scaling in the Limber regime. Consider the case in 
which W(x) is a Gaussian with standard deviation a. The 
analogous equation to Eq. B5 for this case is 



£ Ci 



J 2tt 



,ik\\ (Xi-Xj) 



exp[-fcfa 2 ] P(fc±,fc||). (B9) 



For large a the integral is dominated by small fcy , and we can 
Taylor series expand about kn — as above. In this case, the 
correction due to redshift-space distortions enters at order 
0([k±a]~ 2 ). The RSD term is similar (merely increasing by 
a factor of 2) if one of the two window functions were much 
narrower than a. In addition, exponential or triangle window 
functions also have RSDs entering at 0([fcxcr]~ 2 ). 2a 

It is important for our calculations if the RSDs in Lim- 
ber contribute at 0({k±a]~ 1 ) rather than 0([k±a]~ 2 ), where 
a is the width of our window function. RSDs would be a 
promising signal to break the b^—N- x ^ degeneracy if the 
former scaling held, but are not in the case of the latter. It 
may appear with the formalism in the text, which uses top 
hat Wi, that the C([fcxO"] _1 ) scaling would apply. However, 



23 This result that RSDs depend on the smoothness of W(x) is 
analogous to the finding in Nock ct al. (2010). There, the impact 
of RSDs on the correlation function measured in a top hat projec- 
tion over ~ 100 Mpc was shown to be much more significant than 
when the effective window was smoothed with a pair-averaging 
scheme. 
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for the case of interest where the dN^ p ' /dz is a smooth func- 
tion that is not known, we posit that one is always in the 
regime where the RSD term falls off as (D([k_i_a]~ 2 ). Basis 
functions can always be chosen that have smooth W(x) and 
where the RSD terms contribute at 0([k±a]~ 2 ). That they 
contribute at 0([/cxO"] -1 ) for top hat windows is a pathologi- 
cal result of our basis choice that implicitly assumes that the 
distribution of dN^ jdz is a histogram with sharp breaks 
between redshift steps. Thus, to include RSDs properly re- 
quires a smoother basis set for the Wi than we take in the 
text. Because of this added complication, we do not consider 
RSDs in our formulae in the text. For the reasons espoused 
above and because the modes that contribute to our esti- 
mate are generally safely in the Limber regime, the bias 
from ignoring them should be small. 



the-envelope estimate for Cf f is 

13 ~ V 6^ J (2x 10 7 Mpc 2 ) \xi Xi) 

(C5) 

when i > j, and we have approximated Wi and Wj as 
sharply peaked around their respective redshifts. This is sim- 
ilar to the Cjj term without lensing (Eq. 21), differing most 
importantly by the factor [(1 + Zj) \j AXj']/2 x 10 7 Mpc 2 . 
This factor is 0(1O~ 2 ) for populations at z ~ 1 and A?bm ~ 
50, but can be larger for higher redshift populations. Thus, 
magnification will add off-diagonal terms that are 0(1O~ 2 ) 
of the diagonal terms in C and were zero in our previous 
treatment. (C was typically approximated as diagonal in the 
text.) The new magnification terms have a larger impact on 
the components in A involving p, as these terms sum over i 
and j in dj 



APPENDIX C: MAGNIFICATION BIAS 
CI Effect of magnification 

The spatial density of observed galaxies is modulated by an 
additional factor that we have ignored so far of (1+6^) owing 
to lensing magnification (Turner et al. 1984; Fugmann 1988; 
Narayan 1989; Hui ct al. 2007, 2008). In the weak lensing 
regime, 



6 ll (n,z i ) = 2(-a<T ) -l) f 
Jo 



dx 



Xi 



Xi 



x -xvi 



n), (CI) 



where V5_ is the comoving Laplacian in the plane perpen- 
dicular to the radial direction and af^ is the power-law 
slope of the cumulative number of sources at the survey flux 
threshold and redshift Zi. (Note that otf is defined to be 
a negative number as long as the cumulative number de- 
creases with increasing flux.) Thus, magnification generates 
additional correlations such that 



Cij > Ci 



ij 



(C2) 



5fA 



where C jt 



is the cross-correlation function between the 
galaxy overdensity field in redshift slice j and and we 
are dropping the smaller C^ M term. In the Limber regime, 
the expression for the new terms in Eq. C2 is (Bartelmann 
& Schneider 2001, their Eq. 7.9) 



1 \ 3//( 2 H() 



[dx 
J Xa 



W J {x)Y l {x)D 2 {x)P{-) 



for i > j. Otherwise, 
in our calculations), 



C 



5/j, 



x 

(C3) 

(we set this to zero for i = j 
and we denote the source population 
in question by x and lens by y as it could be either the 
photometric or spectroscopic sample. Here, 



Yi 



(x) = f 



dx'Wiix') 



x -x 

X' 



(C4) 



Magnification depends only on the bias of the lens and not 
the source and so can break the degeneracy between bias and 
number. (This dependence may be opaque in our notation 
as the Cf? enter A multiplied by factors of the bias.) 

Noting that c 2 /(3i7 2 fi m ) = 2 x 10 7 Mpc 2 , a back-of- 



C2 Photo-z calibration with magnification 

Here we discuss how magnification could potentially be cor- 
rected in the application of photo- z calibration investigated 
in Section 7.1 (and we use the same notation as introduced 
there). We consider a simplified problem in which most of 
the pm photo- z sample is concentrated at redshift z m . Then, 
there is a significant bias if the error on T^™ 1 ^ /T„ m ' ) is com- 
parable to C^l/Cu, which we just showed is ©([iVbm] -1 ) for 
Zi ~ 1, where the number of bins here is set by how broad 
the bins have to be for a single bin to encompass most of 
pm. 

The minimum variance estimator with a prior on the 
a.^ (which enters analogously to the number prior in 
Eq. 64) can also be written for this simplified problem: First, 
the covariance matrix at some I and in the Limber approx- 
imation is 



Doi 



pt pra) ] 2 c„ 



+ w 



(pm) 



+ M, 



T (vm) T (s) c ,, + T (pm) T (s) c ^ + w ( : 



(pms) 



(C6) 
(C7) 
(C8) 



where Ml encompasses the impact of all magnification on the 
photometric sample, and we have dropped terms that that 
do not contain T^f m ' except the off-diagonal dependence of 
rp(pm) ^ p or S p ec jg ec j rj anc L a p r ; or on a ( x ) w ith variance 

a a , the minimum variance quadratic estimator is 



where S' = A>o£>ii(Ak>I>ii + A5i)/det[D] 2 , a {x) is set by 
the prior, we have assumed that T^" 1 ' is well constrained 
by other cross (and auto) correlations (which is quite likely), 
and F also has a simple analytic representation. This esti- 
mator is quite analogous to our previous estimator. 

It is instructive to look at the variance on a measure- 
ment of T^ vm ^ in a single mode: 



DqqDh + S'jT^T^C^/ja + 1) 



(C10) 



This equation shows that error on the magnification bias 
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times S' (the latter term in the numerator) has to be com- 
parable to the auto power terms (the former term) in order 
to change our previously quoted errors in Section 7.1. It also 
suggests that it may be desirable to down weight large-angle 
modes where S' is largest (that have the smallest noise) and, 
hence, where the fog from lensing is most disruptive. 



APPENDIX D: RECURRENCE RELATIONS 
FOR (AND THE EVALUATION OF INTEGRALS 
OVER) SPHERICAL BESSEL FUNCTIONS 

Throughout we need to perform integrals over spherical 
Bessel functions. Numerical methods for evaluating spher- 
ical Bessel functions and integrating over them are well 
advanced, but do not seem to be widely known. This ap- 
pendix gives the details of the algorithms used in this 
study. Further details can be found in (Miller 1952; Cor- 
bato & Uretsky 1959; Gillman & Fiebig 1988; Poularikas 
2000) or at http://www.utdallas.edu/~cantrell/ee6481/ 
lectures/bessresl . pdf . 

First we address the evaluation of the je . For small val- 
ues of the argument, we use a series expansion of je(x). For 
larger values, we evaluate the je using a downwardly stable 
recurrence relation for r e = je/je-i- Specifically we first ini- 
tialize tl by setting jh{x) — for L much larger than any £ 
of interest (and x). Then the relation 



re-i = 



(2£ - l)/x - n 



(Dl) 



is downwardly stable and can be used to find re for < I < 
L. The je can then be evaluated by moving up the hierarchy 
after initializing jo{x) = sin(x)/x. 

Eqs. (13) and (14) are difficult integrals to evaluate ow- 
ing to the oscillatory nature of the jt. We experimented with 
using the scheme suggested in Lucas (1995) of decomposing 
the product of je into a sum of functions that each have a 
single oscillatory period at large arguments and then using 
the transformations discussed therein on a series where the 
n th member is our fc-integral evaluated from out to the 
n th zero. This operation removes oscillatory behavior in this 
slowly converging series so that it converges more quickly 
to the n — > oo limit, and the integral converges for n ~ 10 
(Lucas 1995). Experiments with some of the integral terms 
indicated that the Lucas (1995) method was much faster 
than a brute-force integration, but we were able to find a 
simpler implementation which was sufficiently fast and ac- 
curate. In particular, we ended up evaluating these integrals 
by brute force, integrating typically out to the 1, 000 th zero 
of the ae(k,Zi) (which were pre-computed and stored in a 
table). A slight improvement in the convergence of the in- 
tegral was obtained by applying a Gaussian damping to the 
integrand - based on the fact that ku ^> £j\ should not 
contribute much to the integral. The details of this damping 
did not affect our results. 



interest) and assuming that the underlying power spectrum 
is a power-law. 

Recall that within the Limber approximation (Section 

3.1) 



Ce 



= J d X P{k) 



W\ X ) 
X 2 



(El) 



where W(x) is the projection kernel that defines the 2D 
(projected) overdensity in terms of the 3D, and it integrates 
to unity against d\. We shall assume that W(x) is peaked 
at xo and of width Ax such that k\o 3> kA\ ^> 1 for scales, 
k, which contribute significantly. 

Assuming a power-law power spectrum of the form 
A 2 (fc) = k 3 P(k)/2ir 2 = (fc/fc*) 3+n , with -2 < n < -1, 
the real-space 3D correlation function is 

CM = (^) 7 = J fA 2 (k) j (fcr) = B n (fc*r)- 3 - , (E2) 

where B n = — sin(n7r/2) T(2 + n, 0), which respectively 
equals 1.25 and 1 for n — —3/2 and n = — 1 (B n diverges 
as n — > — 3 + ). It follows from Eq. E2 that 7 = n + 3 and 
r = Bl h /k i ,. 

In the Limber approximation, 

C t = £-AJ-X, (E3) 



kiV \k*xo 

where V = Xo ls > trie volume per steradian. Using anal- 
ogous relations to Eq. E2, the 2D or projected correlation 
function is 



w(0) 



Q \ n+2 A 
0* \ 7T A„ 



klV 



(fc*Xo)" 



(E4) 



where A n = 2 n+1 T(l + n/2)/r(-n/2) ~ 2.1 and 1 for n = 
—3/2 and n = — 1 (A n diverges as n — > — 2 + ). 

Particularly simple expressions hold in the case n = — 1 
for which A„ = B„ — 1, so A 2 = (fe/fc*) 2 

£(r) = (~) where ro = fc* \ (E5) 



and 



APPENDIX E: THE POWER-LAW CASE 

It is of interest to work through the expressions for the an- 
gular power spectrum and correlation function within the 
Limber approximation (depth of survey 3> length scales of 
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